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Preface 


In the preface from the 1979 predecessor to this book, Hopcroft and Ullman 
marveled at the fact that the subject of automata had exploded, compared with 
its state at the time they wrote their first book, in 1969. Truly, the 1979 book 
contained many topics not found in the earlier work and was about twice its 
size. If you compare this book with the 1979 book, you will find that, like the 
automobiles of the 1970’s, this book is “larger on the outside, but smaller on 
the inside.” That sounds like a retrograde step, but we are happy with the 
changes for several reasons. 

First, in 1979, automata and language theory was still an area of active 
research. A purpose of that book was to encourage mathematically inclined 
students to make new contributions to the field. Today, there is little direct 
research in automata theory (as opposed to its applications), and thus little 
motivation for us to retain the succinct, highly mathematical tone of the 1979 
book. 

Second, the role of automata and language theory has changed over the 
past two decades. In 1979, automata was largely a graduate-level subject, and 
we imagined our reader was an advanced graduate student, especially those 
using the later chapters of the book. Today, the subject is a staple of the 
undergraduate curriculum. As such, the content of the book must assume less 
in the way of prerequisites from the student, and therefore must provide more 
of the background and details of arguments than did the earlier book. 

A third change in the environment is that Computer Science has grown to 
an almost unimaginable degree in the past three decades. While in 1979 it was 
often a challenge to fill up a curriculum with material that we felt would survive 
the next wave of technology, today very many subdisciplines compete for the 
limited amount of space in the undergraduate curriculum. 

Fourthly, CS has become a more vocational subject, and there is a severe 
pragmatism among many of its students. We continue to believe that aspects 
of automata theory are essential tools in a variety of new disciplines, and we 
believe that the theoretical, mind-expanding exercises embodied in the typical 
automata course retain their value, no matter how much the student prefers to 
learn only the most immediately monetizable technology. However, to assure 
a continued place for the subject on the menu of topics available to the com- 
puter science student, we believe it is necessary to emphasize the applications 
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along with the mathematics. Thus, we have replaced a number of the more 
abstruse topics in the earlier book with examples of how the ideas are used 
today. While applications of automata and language theory to compilers are 
now so well understood that they are normally covered in a compiler course, 
there are a variety of more recent uses, including model-checking algorithms 
to verify protocols and document-description languages that are patterned on 
context-free grammars. 


A final explanation for the simultaneous growth and shrinkage of the book 
is that we were today able to take advantage of the TFX and JATRX typesetting 
systems developed by Don Knuth and Les Lamport. The latter, especially, 
encourages the “open” style of typesetting that makes books larger, but easier 
to read. We appreciate the efforts of both men. 


Use of the Book 


This book is suitable for a quarter or semester course at the Junior level or 
above. At Stanford, we have used the notes in CS154, the course in automata 
and language theory. It is a one-quarter course, which both Rajeev and Jeff have 
taught. Because of the limited time available, Chapter 11 is not covered, and 
some of the later material, such as the more difficult polynomial-time reductions 
in Section 10.4 are omitted as well. The book’s Web site (see below) includes 
notes and syllabi for several offerings of CS154. 


Some years ago, we found that many graduate students came to Stanford 
with a course in automata theory that did not include the theory of intractabil- 
ity. As the Stanford faculty believes that these ideas are essential for every 
computer scientist to know at more than the level of “NP-complete means it 
takes too long,” there is another course, CS154N, that students may take to 
cover only Chapters 8, 9, and 10. They actually participate in roughly the last 
third of CS154 to fulfill the CS154N requirement. Even today, we find several 
students each quarter availing themselves of this option. Since it requires little 
extra effort, we recommend the approach. 


Prerequisites 


To make best use of this book, students should have taken previously a course 
covering discrete mathematics, e.g., graphs, trees, logic, and proof techniques. 
We assume also that they have had several courses in programming, and are 
familiar with common data structures, recursion, and the role of major system 
components such as compilers. These prerequisites should be obtained in a 
typical freshman-sophomore CS program. 
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Exercises 


The book contains extensive exercises, with some for almost every section. We 
indicate harder exercises or parts of exercises with an exclamation point. The 
hardest exercises have a double exclamation point. 

Some of the exercises or parts are marked with a star. For these exercises, 
we shall endeavor to maintain solutions accessible through the book’s Web page. 
These solutions are publicly available and should be used for self-testing. Note 
that in a few cases, one exercise B asks for modification or adaptation of your 
solution to another exercise A. If certain parts of A have solutions, then you 
should expect the corresponding parts of B to have solutions as well. 


Gradiance On-Line Homeworks 


A new feature of the third edition is that there is an accompanying set of on-line 
homeworks using a technology developed by Gradiance Corp. Instructors may 
assign these homeworks to their class, or students not enrolled in a class may 
enroll in an “omnibus class” that allows them to do the homeworks as a tutorial 
(without an instructor-created class). Gradiance questions look like ordinary 
questions, but your solutions are sampled. If you make an incorrect choice you 
are given specific advice or feedback to help you correct your solution. If your 
instructor permits, you are allowed to try again, until you get a perfect score. 

A subscription to the Gradiance service is offered with all new copies of this 
text sold in North America. For more information, visit the Addison-Wesley 
web site www.aw.com/gradiance or send email to computing@aw. com. 


Support on the World Wide Web 


The book’s home page is 
http://www-db.stanford.edu/~ullman/ialc.html 


Here are solutions to starred exercises, errata as we learn of them, and backup 
materials. We hope to make available the notes for each offering of CS154 as 
we teach it, including homeworks, solutions, and exams. 
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Chapter 1 


Automata: The Methods 
and the Madness 


Automata theory is the study of abstract computing devices, or “machines.” 
Before there were computers, in the 1930’s, A. Turing studied an abstract ma- 
chine that had all the capabilities of today’s computers, at least as far as in 
what they could compute. Turing’s goal was to describe precisely the boundary 
between what a computing machine could do and what it could not do; his 
conclusions apply not only to his abstract Turing machines, but to today’s real 
machines. 

In the 1940’s and 1950’s, simpler kinds of machines, which we today call 
“finite automata,” were studied by a number of researchers. These automata, 
originally proposed to model brain function, turned out to be extremely useful 
for a variety of other purposes, which we shall mention in Section 1.1. Also in 
the late 1950’s, the linguist N. Chomsky began the study of formal “grammars.” 
While not strictly machines, these grammars have close relationships to abstract 
automata and serve today as the basis of some important software components, 
including parts of compilers. 

In 1969, S. Cook extended Turing’s study of what could and what could 
not be computed. Cook was able to separate those problems that can be solved 
efficiently by computer from those problems that can in principle be solved, but 
in practice take so much time that computers are useless for all but very small 
instances of the problem. The latter class of problems is called “intractable,” 
or “NP-hard.” It is highly unlikely that even the exponential improvement in 
computing speed that computer hardware has been following (“Moore’s Law”) 
will have significant impact on our ability to solve large instances of intractable 
problems. 

All of these theoretical developments bear directly on what computer scien- 
tists do today. Some of the concepts, like finite automata and certain kinds of 
formal grammars, are used in the design and construction of important kinds 
of software. Other concepts, like the Turing machine, help us understand what 
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we can expect from our software. Especially, the theory of intractable problems 
lets us deduce whether we are likely to be able to meet a problem “head-on” 
and write a program to solve it (because it is not in the intractable class), or 
whether we have to find some way to work around the intractable problem: 
find an approximation, use a heuristic, or use some other method to limit the 
amount of time the program will spend solving the problem. 

In this introductory chapter, we begin with a very high-level view of what 
automata theory is about, and what its uses are. Much of the chapter is de- 
voted to a survey of proof techniques and tricks for discovering proofs. We cover 
deductive proofs, reformulating statements, proofs by contradiction, proofs by 
induction, and other important concepts. A final section introduces the con- 
cepts that pervade automata theory: alphabets, strings, and languages. 


1.1 Why Study Automata Theory? 


There are several reasons why the study of automata and complexity is an 
important part of the core of Computer Science. This section serves to introduce 
the reader to the principal motivation and also outlines the major topics covered 
in this book. 


1.1.1 Introduction to Finite Automata 


Finite automata are a useful model for many important kinds of hardware and 
software. We shall see, starting in Chapter 2, examples of how the concepts are 
used. For the moment, let us just list some of the most important kinds: 


1. Software for designing and checking the behavior of digital circuits. 


2. The “lexical analyzer” of a typical compiler, that is, the compiler com- 
ponent that breaks the input text into logical units, such as identifiers, 
keywords, and punctuation. 


3. Software for scanning large bodies of text, such as collections of Web 
pages, to find occurrences of words, phrases, or other patterns. 


4. Software for verifying systems of all types that have a finite number of 
distinct states, such as communications protocols or protocols for secure 
exchange of information. 


While we shall soon meet a precise definition of automata of various types, 
let us begin our informal introduction with a sketch of what a finite automaton 
is and does. There are many systems or components, such as those enumerated 
above, that may be viewed as being at all times in one of a finite number 
of “states.” The purpose of a state is to remember the relevant portion of the 
system’s history. Since there are only a finite number of states, the entire history 
generally cannot be remembered, so the system must be designed carefully, to 
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remember what is important and forget what is not. The advantage of having 
only a finite number of states is that we can implement the system with a fixed 
set of resources. For example, we could implement it in hardware as a circuit, or 
as a simple form of program that can make decisions looking only at a limited 
amount of data or using the position in the code itself to make the decision. 


Example 1.1: Perhaps the simplest nontrivial finite automaton is an on/off 
switch. The device remembers whether it is in the “on” state or the “off” state, 
and it allows the user to press a button whose effect is different, depending on 
the state of the switch. That is, if the switch is in the off state, then pressing 
the button changes it to the on state, and if the switch is in the on state, then 
pressing the same button turns it to the off state. 


Push 


Start 
TONO 


Push 


Figure 1.1: A finite automaton modeling an on/off switch 


The finite-automaton model for the switch is shown in Fig. 1.1. As for all 
finite automata, the states are represented by circles; in this example, we have 
named the states on and off. Arcs between states are labeled by “inputs,” which 
represent external influences on the system. Here, both arcs are labeled by the 
input Push, which represents a user pushing the button. The intent of the two 
arcs is that whichever state the system is in, when the Push input is received 
it goes to the other state. 

One of the states is designated the “start state,” the state in which the 
system is placed initially. In our example, the start state is off, and we conven- 
tionally indicate the start state by the word Start and an arrow leading to that 
state. 

It is often necessary to indicate one or more states as “final” or “accepting” 
states. Entering one of these states after a sequence of inputs indicates that 
the input sequence is good in some way. For instance, we could have regarded 
the state on in Fig. 1.1 as accepting, because in that state, the device being 
controlled by the switch will operate. It is conventional to designate accepting 
states by a double circle, although we have not made any such designation in 
Fig. 1.1. 


Example 1.2: Sometimes, what is remembered by a state can be much more 
complex than an on/off choice. Figure 1.2 shows another finite automaton that 
could be part of a lexical analyzer. The job of this automaton is to recognize 
the keyword then. It thus needs five states, each of which represents a different 
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position in the word then that has been reached so far. These positions corre- 
spond to the prefixes of the word, ranging from the empty string (i.e., nothing 
of the word has been seen so far) to the complete word. 


"LOOO EOE) 


Figure 1.2: A finite automaton modeling recognition of then 


In Fig. 1.2, the five states are named by the prefix of then seen so far. Inputs 
correspond to letters. We may imagine that the lexical analyzer examines one 
character of the program that it is compiling at a time, and the next character 
to be examined is the input to the automaton. The start state corresponds to 
the empty string, and each state has a transition on the next letter of then to 
the state that corresponds to the next-larger prefix. The state named then is 
entered when the input has spelled the word then. Since it is the job of this 
automaton to recognize when then has been seen, we could consider that state 
the lone accepting state. 


1.1.2 Structural Representations 


There are two important notations that are not automaton-like, but play an 
important role in the study of automata and their applications. 


1. Grammars are useful models when designing software that processes data 
with a recursive structure. The best-known example is a “parser,” the 
component of a compiler that deals with the recursively nested features 
of the typical programming language, such as expressions — arithmetic, 
conditional, and so on. For instance, a grammatical rule like E > E + E 
states that an expression can be formed by taking any two expressions 
and connecting them by a plus sign; this rule is typical of how expressions 
of real programming languages are formed. We introduce context-free 
grammars, as they are usually called, in Chapter 5. 


2. Regular Expressions also denote the structure of data, especially text 
strings. As we shall see in Chapter 3, the patterns of strings they describe 
are exactly the same as what can be described by finite automata. The 
style of these expressions differs significantly from that of grammars, and 
we shall content ourselves with a simple example here. The UNIX-style 
regular expression ’ [A-Z] [a-z]*[ ] [A-Z] [A-Z]’ represents capitalized 
words followed by a space and two capital letters. This expression rep- 
resents patterns in text that could be a city and state, e.g., Ithaca NY. 
It misses multiword city names, such as Palo Alto CA, which could be 
captured by the more complex expression 


> [A-Z] [a-z] *([ ] [A-Z] [a-z]*)*[ ] [A-Z] [A-Z]? 
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When interpreting such expressions, we only need to know that [A-Z] 
represents a range of characters from capital “A” to capital “Z” (i.e., any 
capital letter), and [ ] is used to represent the blank character alone. 
Also, the symbol * represents “any number of” the preceding expression. 
Parentheses are used to group components of the expression; they do not 
represent characters of the text described. 


1.1.3 Automata and Complexity 


Automata are essential for the study of the limits of computation. As we 
mentioned in the introduction to the chapter, there are two important issues: 


1. What can a computer do at all? This study is called “decidability,” and 
the problems that can be solved by computer are called “decidable.” This 
topic is addressed in Chapter 9. 


2. What can a computer do efficiently? This study is called “intractabil- 
ity,” and the problems that can be solved by a computer using no more 
time than some slowly growing function of the size of the input are called 
“tractable.” Often, we take all polynomial functions to be “slowly grow- 
ing,” while functions that grow faster than any polynomial are deemed to 
grow too fast. The subject is studied in Chapter 10. 


1.2 Introduction to Formal Proof 


If you studied plane geometry in high school any time before the 1990’s, you 
most likely had to do some detailed “deductive proofs,” where you showed 
the truth of a statement by a detailed sequence of steps and reasons. While 
geometry has its practical side (e.g., you need to know the rule for computing 
the area of a rectangle if you need to buy the correct amount of carpet for a 
room), the study of formal proof methodologies was at least as important a 
reason for covering this branch of mathematics in high school. 

In the USA of the 1990’s it became popular to teach proof as a matter 
of personal feelings about the statement. While it is good to feel the truth 
of a statement you need to use, important techniques of proof are no longer 
mastered in high school. Yet proof is something that every computer scientist 
needs to understand. Some computer scientists take the extreme view that a 
formal proof of the correctness of a program should go hand-in-hand with the 
writing of the program itself. We doubt that doing so is productive. On the 
other hand, there are those who say that proof has no place in the discipline of 
programming. The slogan “if you are not sure your program is correct, run it 
and see” is commonly offered by this camp. 

Our position is between these two extremes. Testing programs is surely 
essential. However, testing goes only so far, since you cannot try your program 
on every input. More importantly, if your program is complex — say a tricky 
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recursion or iteration — then if you don’t understand what is going on as you 
go around a loop or call a function recursively, it is unlikely that you will write 
the code correctly. When your testing tells you the code is incorrect, you still 
need to get it right. 

To make your iteration or recursion correct, you need to set up an inductive 
hypothesis, and it is helpful to reason, formally or informally, that the hypoth- 
esis is consistent with the iteration or recursion. This process of understanding 
the workings of a correct program is essentially the same as the process of prov- 
ing theorems by induction. Thus, in addition to giving you models that are 
useful for certain types of software, it has become traditional for a course on 
automata theory to cover methodologies of formal proof. Perhaps more than 
other core subjects of computer science, automata theory lends itself to natural 
and interesting proofs, both of the deductive kind (a sequence of justified steps) 
and the inductive kind (recursive proofs of a parameterized statement that use 
the statement itself with “lower” values of the parameter). 


1.2.1 Deductive Proofs 


As mentioned above, a deductive proof consists of a sequence of statements 
whose truth leads us from some initial statement, called the hypothesis or the 
given statement(s), to a conclusion statement. Each step in the proof must 
follow, by some accepted logical principle, from either the given facts, or some 
of the previous statements in the deductive proof, or a combination of these. 

The hypothesis may be true or false, typically depending on values of its 
parameters. Often, the hypothesis consists of several independent statements 
connected by a logical AND. In those cases, we talk of each of these statements 
as a hypothesis, or as a given statement. 

The theorem that is proved when we go from a hypothesis H to a conclusion 
C is the statement “if H then C.” We say that C is deduced from H. An example 
theorem of the form “if H then C” will illustrate these points. 


Theorem 1.3: If x > 4, then 2” > z?. 


It is not hard to convince ourselves informally that Theorem 1.3 is true, 
although a formal proof requires induction and will be left for Example 1.17. 
First, notice that the hypothesis H is “x > 4.” This hypothesis has a parameter, 
x, and thus is neither true nor false. Rather, its truth depends on the value of 
the parameter x; e.g., H is true for x = 6 and false for x = 2. 

Likewise, the conclusion C is “2% > z?.” This statement also uses parameter 
x and is true for certain values of x and not others. For example, C is false for 
x = 3, since 2° = 8, which is not as large as 3? = 9. On the other hand, C is 
true for xz = 4, since 24 = 4? = 16. For z = 5, the statement is also true, since 
2° = 32 is at least as large as 57 = 25. 

Perhaps you can see the intuitive argument that tells us the conclusion 
2* > x? will be true whenever x > 4. We already saw that it is true for x = 4. 
As x grows larger than 4, the left side, 2” doubles each time x increases by 
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1. However, the right side, z?, grows by the ratio (2)? If z > 4, then 


(x + 1)/a cannot be greater than 1.25, and therefore (241)? cannot be bigger 
than 1.5625. Since 1.5625 < 2, each time x increases above 4 the left side 2” 
grows more than the right side z?. Thus, as long as we start from a value like 
x = 4 where the inequality 2” > 2? is already satisfied, we can increase x as 
much as we like, and the inequality will still be satisfied. 

We have now completed an informal but accurate proof of Theorem 1.3. We 
shall return to the proof and make it more precise in Example 1.17, after we 
introduce “inductive” proofs. 

Theorem 1.3, like all interesting theorems, involves an infinite number of 
related facts, in this case the statement “if x > 4 then 2” > 2?” for all integers 
x. In fact, we do not need to assume z is an integer, but the proof talked about 
repeatedly increasing x by 1, starting at x = 4, so we really addressed only the 
situation where x is an integer. 

Theorem 1.3 can be used to help deduce other theorems. In the next ex- 
ample, we consider a complete deductive proof of a simple theorem that uses 
Theorem 1.3. 


Theorem 1.4: If x is the sum of the squares of four positive integers, then 
27 > g. 

PROOF: The intuitive idea of the proof is that if the hypothesis is true for x, 
that is, x is the sum of the squares of four positive integers, then z must be at 
least 4. Therefore, the hypothesis of Theorem 1.3 holds, and since we believe 
that theorem, we may state that its conclusion is also true for x. The reasoning 
can be expressed as a sequence of steps. Each step is either the hypothesis of 
the theorem to be proved, part of that hypothesis, or a statement that follows 
from one or more previous statements. 

By “follows” we mean that if the hypothesis of some theorem is a previous 
statement, then the conclusion of that theorem is true, and can be written down 
as a statement of our proof. This logical rule is often called modus ponens; i.e., 
if we know H is true, and we know “if H then C” is true, we may conclude 
that C is true. We also allow certain other logical steps to be used in creating 
a statement that follows from one or more previous statements. For instance, 
if A and B are two previous statements, then we can deduce and write down 
the statement “A and B.” 

Figure 1.3 shows the sequence of statements we need to prove Theorem 1.4. 
While we shall not generally prove theorems in such a stylized form, it helps to 
think of proofs as very explicit lists of statements, each with a precise justifica- 
tion. In step (1), we have repeated one of the given statements of the theorem: 
that x is the sum of the squares of four integers. It often helps in proofs if we 
name quantities that are referred to but not named, and we have done so here, 
giving the four integers the names a, b, c, and d. 

In step (2), we put down the other part of the hypothesis of the theorem: 
that the values being squared are each at least 1. Technically, this statement 
represents four distinct statements, one for each of the four integers involved. 
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Statement Justification 


t=at+b+e4+d Given 
a>1;b>1;e¢>1;d>1 Given 
a? > 1; b? > 1; @ > 1; d? >1 | (2) and properties of arithmetic 


r>4 (1), (3), and properties of arithmetic 
ta (4) and Theorem 1.3 


Figure 1.3: A formal proof of Theorem 1.4 


Then, in step (3) we observe that if a number is at least 1, then its square is 
also at least 1. We use as a justification the fact that statement (2) holds, and 
“properties of arithmetic.” That is, we assume the reader knows, or can prove 
simple statements about how inequalities work, such as the statement “if y > 1, 
then y? > 1.” 

Step (4) uses statements (1) and (3). The first statement tells us that x is 
the sum of the four squares in question, and statement (3) tells us that each of 
the squares is at least 1. Again using well-known properties of arithmetic, we 
conclude that x is at least 1+ 1+1+1, or 4. 

At the final step (5), we use statement (4), which is the hypothesis of Theo- 
rem 1.3. The theorem itself is the justification for writing down its conclusion, 
since its hypothesis is a previous statement. Since the statement (5) that is 
the conclusion of Theorem 1.3 is also the conclusion of Theorem 1.4, we have 
now proved Theorem 1.4. That is, we have started with the hypothesis of that 
theorem, and have managed to deduce its conclusion. 


1.2.2 Reduction to Definitions 


In the previous two theorems, the hypotheses used terms that should have 
been familiar: integers, addition, and multiplication, for instance. In many 
other theorems, including many from automata theory, the terms used in the 
statement may have implications that are less obvious. A useful way to proceed 
in many proofs is: 


e If you are not sure how to start a proof, convert all terms in the hypothesis 
to their definitions. 


Here is an example of a theorem that is simple to prove once we have ex- 
pressed its statement in elementary terms. It uses the following two definitions: 


1. A set S is finite if there exists an integer n such that S has exactly n 
elements. We write ||S|]| = n, where ||S|| is used to denote the number 
of elements in a set S. If the set S is not finite, we say S is infinite. 
Intuitively, an infinite set is a set that contains more than any integer 
number of elements. 
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2. If S and T are both subsets of some set U, then T is the complement of S 
(with respect to U) if SUT =U and S N T =. That is, each element 
of U is in exactly one of S and T; put another way, T consists of exactly 
those elements of U that are not in S. 


Theorem 1.5: Let S be a finite subset of some infinite set U. Let T be the 
complement of S with respect to U. Then T is infinite. 


PROOF: Intuitively, this theorem says that if you have an infinite supply of 
something (U), and you take a finite amount away (S), then you still have an 
infinite amount left. Let us begin by restating the facts of the theorem as in 
Fig. 1.4. 


Original Statement New Statement 


S is finite There is a integer n 
such that ||S|| =n 


is ||U]| = p 


T is the complement of S | SUT =U and SAT =f 


Figure 1.4: Restating the givens of Theorem 1.5 


We are still stuck, so we need to use a common proof technique called “proof 
by contradiction.” In this proof method, to be discussed further in Section 1.3.3, 
we assume that the conclusion is false. We then use that assumption, together 
with parts of the hypothesis, to prove the opposite of one of the given statements 
of the hypothesis. We have then shown that it is impossible for all parts of the 
hypothesis to be true and for the conclusion to be false at the same time. 
The only possibility that remains is for the conclusion to be true whenever the 
hypothesis is true. That is, the theorem is true. 

In the case of Theorem 1.5, the contradiction of the conclusion is “T is 
finite.” Let us assume T is finite, along with the statement of the hypothesis 
that says S is finite; i.e., ||S|| = n for some integer n. Similarly, we can restate 
the assumption that T is finite as ||T|| = m for some integer m. 

Now one of the given statements tells us that S U T = U, and SN T = Í. 
That is, the elements of U are exactly the elements of S and T. Thus, there 
must be n + m elements of U. Since n + m is an integer, and we have shown 
||U || = n +m, it follows that U is finite. More precisely, we showed the number 
of elements in U is some integer, which is the definition of “finite.” But the 
statement that U is finite contradicts the given statement that U is infinite. We 
have thus used the contradiction of our conclusion to prove the contradiction 
of one of the given statements of the hypothesis, and by the principle of “proof 
by contradiction” we may conclude the theorem is true. 


Proofs do not have to be so wordy. Having seen the ideas behind the proof, 
let us reprove the theorem in a few lines. 
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Statements With Quantifiers 


Many theorems involve statements that use the quantifiers “for all” and 
“there exists,” or similar variations, such as “for every” instead of “for all.” 
The order in which these quantifiers appear affects what the statement 
means. It is often helpful to see statements with more than one quantifier 
as a “game” between two players — for-all and there-exists — who take 
turns specifying values for the parameters mentioned in the theorem. “For- 
all” must consider all possible choices, so for-all’s choices are generally left 
as variables. However, “there-exists” only has to pick one value, which 
may depend on the values picked by the players previously. The order in 
which the quantifiers appear in the statement determines who goes first. 
If the last player to make a choice can always find some allowable value, 
then the statement is true. 

For example, consider an alternative definition of “infinite set”: set S 
is infinite if and only if for all integers n, there exists a subset T of S with 
exactly n members. Here, “for-all” precedes “there-exists,” so we must 
consider an arbitrary integer n. Now, “there-exists” gets to pick a subset 
T, and may use the knowledge of n to do so. For instance, if S were the 
set of integers, “there-exists” could pick the subset T = {1,2,...,n} and 
thereby succeed regardless of n. That is a proof that the set of integers is 
infinite. 

The following statement looks like the definition of “infinite,” but is 
incorrect because it reverses the order of the quantifiers: “there exists a 
subset T of set S such that for all n, set T has exactly n members.” Now, 
given a set S such as the integers, player “there-exists” can pick any set 
T; say {1,2,5} is picked. For this choice, player “for-all” must show that 
T has n members for every possible n. However, “for-all” cannot do so. 
For instance, it is false for n = 4, or in fact for any n Æ 3. 


PROOF: (of Theorem 1.5) We know that SUT =U and S and T are disjoint, 
so ||S|| + |||] = ||U||. Since S is finite, ||S|| = n for some integer n, and since U 
is infinite, there is no integer p such that ||U|| = p. So assume that T is finite; 
that is, ||T|| = m for some integer m. Then ||U|| = ||.S|| + |||] = n +m, which 
contradicts the given statement that there is no integer p equal to ||U ||. 


1.2.3 Other Theorem Forms 


The “if-then” form of theorem is most common in typical areas of mathematics. 
However, we see other kinds of statements proved as theorems also. In this 
section, we shall examine the most common forms of statement and what we 
usually need to do to prove them. 
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Ways of Saying “If-Then” 


First, there are a number of kinds of theorem statements that look different 
from a simple “if H then C” form, but are in fact saying the same thing: if 
hypothesis H is true for a given value of the parameter(s), then the conclusion 
C is true for the same value. Here are some of the other ways in which “if H 
then C” might appear. 


1. H implies C. 
2. H only if C. 
3. Cif H. 


4. Whenever H holds, C follows. 


We also see many variants of form (4), such as “if H holds, then C follows,” or 
“whenever H holds, C holds.” 


Example 1.6: The statement of Theorem 1.3 would appear in these four forms 
as: 


1. z > 4 implies 27 > z?. 
2. x > 4 only if 27 > x°. 
3.2% > x? ifa>4. 


4. Whenever x > 4, 2* > x? follows. 


In addition, in formal logic one often sees the operator — in place of “if- 
then.” That is, the statement “if H then C” could appear as H > C in some 
mathematical literature; we shall not use it here. 


If-And-Only-If Statements 


Sometimes, we find a statement of the form “A if and only if B.” Other forms 
of this statement are “A iff B,”! “A is equivalent to B,” or “A exactly when 
B.” This statement is actually two ifthen statements: “if A then B,” and “if 
B then A.” We prove “A if and only if B” by proving these two statements: 


1. The if part: “if B then A,” and 


2. The only-if part: “if A then B,” which is often stated in the equivalent 
form “A only if B.” 
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How Formal Do Proofs Have to Be? 


The answer to this question is not easy. The bottom line regarding proofs 
is that their purpose is to convince someone, whether it is a grader of your 
classwork or yourself, about the correctness of a strategy you are using in 
your code. If it is convincing, then it is enough; if it fails to convince the 
“consumer” of the proof, then the proof has left out too much. 

Part of the uncertainty regarding proofs comes from the different 
knowledge that the consumer may have. Thus, in Theorem 1.4, we as- 
sumed you knew all about arithmetic, and would believe a statement like 
“if y > 1 then y? > 1.” If you were not familiar with arithmetic, we would 
have to prove that statement by some steps in our deductive proof. 

However, there are certain things that are required in proofs, and 
omitting them surely makes the proof inadequate. For instance, any de- 
ductive proof that uses statements which are not justified by the given or 
previous statements, cannot be adequate. When doing a proof of an “if 
and only if” statement, we must surely have one proof for the “if” part and 
another proof for the “only-if” part. As an additional example, inductive 
proofs (discussed in Section 1.4) require proofs of the basis and induction 
parts. 


The proofs can be presented in either order. In many theorems, one part is 
decidedly easier than the other, and it is customary to present the easy direction 
first and get it out of the way. 

In formal logic, one may see the operator + or = to denote an “if-and-only- 
if” statement. That is, A = B and A © B mean the same as “A if and only if 
B.” 

When proving an if-and-only-if statement, it is important to remember that 
you must prove both the “if” and “only-if” parts. Sometimes, you will find it 
helpful to break an if-and-only-if into a succession of several equivalences. That 
is, to prove “A if and only if B,” you might first prove “A if and only if C,” and 
then prove “C if and only if B.” That method works, as long as you remember 
that each if-and-only-if step must be proved in both directions. Proving any 
one step in only one of the directions invalidates the entire proof. 

The following is an example of a simple if-and-only-if proof. It uses the 
notations: 


1. |x], the floor of real number zx, is the greatest integer equal to or less than 
LQ. 


‘Tf, short for “if and only if,” is a non-word that is used in some mathematical treatises 
for succinctness. 
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2. [x], the ceiling of real number zx, is the least integer equal to or greater 
than x. 


Theorem 1.7: Let x be a real number. Then || = [x] if and only if is an 
integer. 


PROOF: (Only-if part) In this part, we assume |x| = [x] and try to prove x is 
an integer. Using the definitions of the floor and ceiling, we notice that |x| < x, 
and [x] > x. However, we are given that |x| = [x]. Thus, we may substitute 
the floor for the ceiling in the first inequality to conclude [x] < x. Since 
both [x] < x and [x] > x hold, we may conclude by properties of arithmetic 
inequalities that [2] = x. Since [x] is always an integer, x must also be an 
integer in this case. 


(If part) Now, we assume zx is an integer and try to prove |æ] = [x]. This part 
is easy. By the definitions of floor and ceiling, when x is an integer, both |æ] 
and [x] are equal to x, and therefore equal to each other. 


1.2.4 Theorems That Appear Not to Be If-Then 
Statements 


Sometimes, we encounter a theorem that appears not to have a hypothesis. An 
example is the well-known fact from trigonometry: 


Theorem 1.8: sin? 6+ cos? 6 = 1. 


Actually, this statement does have a hypothesis, and the hypothesis consists 
of all the statements you need to know to interpret the statement. In particular, 
the hidden hypothesis is that 0 is an angle, and therefore the functions sine 
and cosine have their usual meaning for angles. From the definitions of these 
terms, and the Pythagorean Theorem (in a right triangle, the square of the 
hypotenuse equals the sum of the squares of the other two sides), you could 
prove the theorem. In essence, the if-then form of the theorem is really: “if 6 
is an angle, then sin? 8 + cos? 6 = 1.” 


1.3 Additional Forms of Proof 


In this section, we take up several additional topics concerning how to construct 
proofs: 


1. Proofs about sets. 
2. Proofs by contradiction. 


3. Proofs by counterexample. 
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1.3.1 Proving Equivalences About Sets 


In automata theory, we are frequently asked to prove a theorem which says that 
the sets constructed in two different ways are the same sets. Often, these sets 
are sets of character strings, and the sets are called “languages,” but in this 
section the nature of the sets is unimportant. If E and F are two expressions 
representing sets, the statement E = F means that the two sets represented 
are the same. More precisely, every element in the set represented by EF is in 
the set represented by F, and every element in the set represented by F is in 
the set represented by E. 


Example 1.9: The commutative law of union says that we can take the union 
of two sets R and S in either order. That is, RUS = SU R. In this case, E is 
the expression R U S and F is the expression S U R. The commutative law of 
union says that E = F. 


We can write a set-equality E = F as an if-and-only-if statement: an element 
x is in E if and only if x is in F. As a consequence, we see the outline of a 
proof of any statement that asserts the equality of two sets E = F; it follows 
the form of any if-and-only-if proof: 


1. Proof that if x is in E, then z isin F. 
2. Prove that if x isin F, then z is in E. 


As an example of this proof process, let us prove the distributive law of 
union over intersection: 


Theorem 1.10: RU(SNT)=(RUS)N (RUT). 
PROOF: The two set-expressions involved are E = RU (SMT) and 


F =(RUS)N (RUT) 


We shall prove the two parts of the theorem in turn. In the “if” part we assume 
element x is in E and show it is in F. This part, summarized in Fig. 1.5, uses 
the definitions of union and intersection, with which we assume you are familiar. 

Then, we must prove the “only-if” part of the theorem. Here, we assume x 
is in F and show it is in Æ. The steps are summarized in Fig. 1.6. Since we 
have now proved both parts of the if-and-only-if statement, the distributive law 
of union over intersection is proved. 


1.3.2 The Contrapositive 


Every if-then statement has an equivalent form that in some circumstances is 
easier to prove. The contrapositive of the statement “if H then C” is “if not C 
then not H.” A statement and its contrapositive are either both true or both 
false, so we can prove either to prove the other. 

To see why “if H then C” and “if not C then not H” are logically equivalent, 
first observe that there are four cases to consider: 
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Statement Justification 

zisin RU(SNT) Given 

xisin Rorzisin SMT | (1) and definition of union 

x isin Ror x is in (2) and definition of intersection 
both S and T 


xisin RUS (3) and definition of union 
“isin RUT (3) and definition of union 
x isin (RU S) N (RUT) | (4), (5), and definition 

of intersection 


Figure 1.5: Steps in the “if” part of Theorem 1.10 


Statement Justification 

xisin (RU S) N (RUT) | Given 

zisin RUS (1) and definition of intersection 
xisin RUT (1) and definition of intersection 


x isin Ror g is in (2), (3), and reasoning 

both S and T about unions 
xisin Rorzisin SMT | (4) and definition of intersection 
“isin RU (SAT) (5) and definition of union 


Figure 1.6: Steps in the “only-if” part of Theorem 1.10 


1. H and C both true. 
. H true and C false. 
. C true and H false. 


AeA © N 


. H and C both false. 


There is only one way to make an if-then statement false; the hypothesis must 
be true and the conclusion false, as in case (2). For the other three cases, 
including case (4) where the conclusion is false, the if-then statement itself is 
true. 

Now, consider for which cases the contrapositive “if not C then not H” is 
false. In order for this statement to be false, its hypothesis (which is “not C”) 
must be true, and its conclusion (which is “not H”) must be false. But “not 
C” is true exactly when C is false, and “not H” is false exactly when H is true. 
These two conditions are again case (2), which shows that in each of the four 
cases, the original statement and its contrapositive are either both true or both 
false; i.e., they are logically equivalent. 


Example 1.11: Recall Theorem 1.3, whose statement was: “if x > 4, then 
27 > r?” The contrapositive of this statement is “if not 27 > 2? then not 
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Saying “If-And-Only-If” for Sets 


As we mentioned, theorems that state equivalences of expressions about 
sets are if-and-only-if statements. Thus, Theorem 1.10 could have been 
stated: an element x is in RU (S N T) if and only if x is in 


(RUS)N(RUT) 


Another common expression of a set-equivalence is with the locution 
“all-and-only.” For instance, Theorem 1.10 could as well have been stated 
“the elements of R U (S N T) are all and only the elements of 


(RU S)N (RUT) 


The Converse 


Do not confuse the terms “contrapositive” and “converse.” The converse 
of an if-then statement is the “other direction”; that is, the converse of “if 
H then C” is “if C then H.” Unlike the contrapositive, which is logically 
equivalent to the original, the converse is not equivalent to the original 
statement. In fact, the two parts of an if-and-only-if proof are always 
some statement and its converse. 


x > 4.” In more colloquial terms, making use of the fact that “not a > b” is 
the same as a < b, the contrapositive is “if 2” < z? then z < 4.” 


When we are asked to prove an if-and-only-if theorem, the use of the con- 
trapositive in one of the parts allows us several options. For instance, suppose 
we want to prove the set equivalence E = F. Instead of proving “if x is in E 
then x is in F and if x is in F then z is in E,” we could also put one direction 
in the contrapositive. One equivalent proof form is: 


e If x isin E then z is in F, and if x is not in E then z is not in F. 


We could also interchange E and F in the statement above. 


1.3.3 Proof by Contradiction 


Another way to prove a statement of the form “if H then C” is to prove the 
statement 
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e “H and not C implies falsehood.” 


That is, start by assuming both the hypothesis H and the negation of the 
conclusion C. Complete the proof by showing that something known to be 
false follows logically from H and not C. This form of proof is called proof by 
contradiction. 


Example 1.12: Recall Theorem 1.5, where we proved the if-then statement 
with hypothesis H = “U is an infinite set, S is a finite subset of U, and T is 
the complement of S with respect to U.” The conclusion C was “T is infinite.” 
We proceeded to prove this theorem by contradiction. We assumed “not C”; 
that is, we assumed T was finite. 

Our proof was to derive a falsehood from H and not C. We first showed 
from the assumptions that S and T are both finite, that U also must be finite. 
But since U is stated in the hypothesis H to be infinite, and a set cannot be 
both finite and infinite, we have proved the logical statement “false.” In logical 
terms, we have both a proposition p (U is finite) and its negation, not p (U 
is infinite). We then use the fact that “p and not p” is logically equivalent to 
“false.” 


To see why proofs by contradiction are logically correct, recall from Sec- 
tion 1.3.2 that there are four combinations of truth values for H and C. Only 
the second case, H true and C false, makes the statement “if H then C” false. 
By showing that H and not C leads to falsehood, we are showing that case 2 
cannot occur. Thus, the only possible combinations of truth values for H and 
C are the three combinations that make “if H then C” true. 


1.3.4 Counterexamples 


In real life, we are not told to prove a theorem. Rather, we are faced with some- 
thing that seems true — a strategy for implementing a program for example — 
and we need to decide whether or not the “theorem” is true. To resolve the 
question, we may alternately try to prove the theorem, and if we cannot, try to 
prove that its statement is false. 

Theorems generally are statements about an infinite number of cases, per- 
haps all values of its parameters. Indeed, strict mathematical convention will 
only dignify a statement with the title “theorem” if it has an infinite number 
of cases; statements that have no parameters, or that apply to only a finite 
number of values of its parameter(s) are called observations. It is sufficient to 
show that an alleged theorem is false in any one case in order to show it is not a 
theorem. The situation is analogous to programs, since a program is generally 
considered to have a bug if it fails to operate correctly for even one input on 
which it was expected to work. 

It often is easier to prove that a statement is not a theorem than to prove 
it is a theorem. As we mentioned, if S is any statement, then the statement 
“S is not a theorem” is itself a statement without parameters, and thus can 
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be regarded as an observation rather than a theorem. The following are two 
examples, first of an obvious nontheorem, and the second a statement that just 
misses being a theorem and that requires some investigation before resolving 
the question of whether it is a theorem or not. 


Alleged Theorem 1.13: All primes are odd. (More formally, we might say: 
if integer x is a prime, then x is odd.) 


DISPROOF: The integer 2 is a prime, but 2 is even. 


Now, let us discuss a “theorem” involving modular arithmetic. There is an 
essential definition that we must first establish. If a and b are positive integers, 
then a mod b is the remainder when a is divided by b, that is, the unique integer 
r between 0 and b — 1 such that a = qb + r for some integer q. For example, 
8 mod 3 = 2, and 9 mod 3 = 0. Our first proposed theorem, which we shall 
determine to be false, is: 


Alleged Theorem 1.14: There is no pair of integers a and 6 such that 


amodb=bmoda 


When asked to do things with pairs of objects, such as a and b here, it is 
often possible to simplify the relationship between the two by taking advantage 
of symmetry. In this case, we can focus on the case where a < b, since if b < a 
we can swap a and b and get the same equation as in Alleged Theorem 1.14. 
We must be careful, however, not to forget the third case, where a = b. This 
case turns out to be fatal to our proof attempts. 

Let us assume a < b. Then a mod b = a, since in the definition of a mod b 
we have q = 0 and r =a. That is, when a < b we havea =0xb+a. But 
b mod a < a, since anything mod a is between 0 and a — 1. Thus, when a < b, 
b mod a < a mod b, so a mod b = b mod a is impossible. Using the argument 
of symmetry above, we also know that a mod b 4 b mod a when b < a. 

However, consider the third case: a = b. Since z mod x = 0 for any integer 
xz, we do have a mod b = b mod a if a = b. We thus have a disproof of the 
alleged theorem: 


DISPROOF: (of Alleged Theorem 1.14) Let a = b = 2. Then 


amodb=bmoda=0 


In the process of finding the counterexample, we have in fact discovered the 
exact conditions under which the alleged theorem holds. Here is the correct 
version of the theorem, and its proof. 


Theorem 1.15: a mod b = b moda if and only if a= b. 
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PROOF: (If part) Assume a = b. Then as we observed above, x mod x = 0 for 
any integer x. Thus, a mod b = b mod a = 0 whenever a = b. 


(Only-if part) Now, assume a mod b = b mod a. The best technique is a 
proof by contradiction, so assume in addition the negation of the conclusion; 
that is, assume a Æ b. Then since a = b is eliminated, we have only to consider 
the cases a < b and b <a. 

We already observed above that when a < b, we have a mod b = a and 
b mod a < a. Thus, these statements, in conjunction with the hypothesis 
a mod b = b mod a lets us derive a contradiction. 

By symmetry, if b < a then b mod a = b and a mod b < b. We again derive 
a contradiction of the hypothesis, and conclude the only-if part is also true. We 
have now proved both directions and conclude that the theorem is true. 


1.4 Inductive Proofs 


There is a special form of proof, called “inductive,” that is essential when dealing 
with recursively defined objects. Many of the most familiar inductive proofs 
deal with integers, but in automata theory, we also need inductive proofs about 
such recursively defined concepts as trees and expressions of various sorts, such 
as the regular expressions that were mentioned briefly in Section 1.1.2. In this 
section, we shall introduce the subject of inductive proofs first with “simple” 
inductions on integers. Then, we show how to perform “structural” inductions 
on any recursively defined concept. 


1.4.1 Inductions on Integers 


Suppose we are given a statement S(n), about an integer n, to prove. One 
common approach is to prove two things: 


1. The basis, where we show S(i) for a particular integer i. Usually, i = 0 
or i = 1, but there are examples where we want to start at some higher 
i, perhaps because the statement S is false for a few small integers. 


2. The inductive step, where we assume n > i, where i is the basis integer, 
and we show that “if S(n) then S(n + 1).” 


Intuitively, these two parts should convince us that S(n) is true for every 
integer n that is equal to or greater than the basis integer i. We can argue as 
follows. Suppose S(n) were false for one or more of those integers. Then there 
would have to be a smallest value of n, say j, for which S(j) is false, and yet 
j > i. Now j could not be i, because we prove in the basis part that S(i) is 
true. Thus, j must be greater than i. We now know that j —1 > i, and S(j —1) 
is true. 

However, we proved in the inductive part that if n > i, then S(n) implies 
S(n +1). Suppose we let n = j — 1. Then we know from the inductive step 
that S(j — 1) implies S(j). Since we also know S(j — 1), we can conclude S(j). 
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We have assumed the negation of what we wanted to prove; that is, we 
assumed S(j) was false for some j > i. In each case, we derived a contradiction, 
so we have a “proof by contradiction” that S(n) is true for all n > i. 

Unfortunately, there is a subtle logical flaw in the above reasoning. Our 
assumption that we can pick the least j > i for which S(j) is false depends on 
our believing the principle of induction in the first place. That is, the only way 
to prove that we can find such a j is to prove it by a method that is essentially 
an inductive proof. However, the “proof” discussed above makes good intuitive 
sense, and matches our understanding of the real world. Thus, we generally 
take as an integral part of our logical reasoning system: 


e The Induction Principle: If we prove S(i) and we prove that for all n > i, 
S(n) implies S(n + 1), then we may conclude S(n) for all n > i. 


The following two examples illustrate the use of the induction principle to prove 
theorems about integers. 


Theorem 1.16: For all n > 0: 


Doe = moe Dens) (1.1) 


PROOF: The proof is in two parts: the basis and the inductive step; we prove 
each in turn. 


BASIS: For the basis, we pick n = 0. It might seem surprising that the theorem 
even makes sense for n = 0, since the left side of Equation (1.1) is ee when 
n = 0. However, there is a general principle that when the upper limit of a sum 
(0 in this case) is less than the lower limit (1 here), the sum is over no terms 
and therefore the sum is 0. That is, 7?_, i? = 0. 

The right side of Equation (1.1) is also 0, since 0 x (0+1) x (2x0+1)/6 = 0. 
Thus, Equation (1.1) is true when n = 0. 


INDUCTION: Now, assume n > 0. We must prove the inductive step, that 
Equation (1.1) implies the same formula with n + 1 substituted for n. The 
latter formula is 
1 
ve [nt i(n++)2m4 I+) 
i=1 


- (1.2) 


We may simplify Equations (1.1) and (1.2) by expanding the sums and products 
on the right sides. These equations become: 


n 
XOP = (Qn? + 3n? +n)/6 (1.3) 
gl 
n41 
XOP = (2n? + 9n? + 13n + 6)/6 (1.4) 
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We need to prove (1.4) using (1.3), since in the induction principle, these are 
statements S(n +1) and S(n), respectively. The “trick” is to break the sum to 
n + 1 on the left of (1.4) into a sum to n plus the (n + 1)st term. In that way, 
we can replace the sum to n by the left side of (1.3) and show that (1.4) is true. 
These steps are as follows: 


Oa + (n+ 1)? = (2n? + 9n? + 13n + 6)/6 (1.5) 
= 
(2n? + 3n? +n)/6 + (n? +2n + 1) = (2n? + 9n? + 13n + 6)/6 (1.6) 


The final verification that (1.6) is true requires only simple polynomial algebra 
on the left side to show it is identical to the right side. 


Example 1.17: In the next example, we prove Theorem 1.3 from Section 1.2.1. 
Recall this theorem states that if x > 4, then 27 > x”. We gave an informal 
proof based on the idea that the ratio x?/2* shrinks as x grows above 4. We 
can make the idea precise if we prove the statement 2% > 2? by induction on 
x, starting with a basis of x = 4. Note that the statement is actually false for 
xz <4. 


BASIS: If x = 4, then 2* and z? are both 16. Thus, 24 > 4? holds. 


INDUCTION: Suppose for some x > 4 that 2” > x”. With this statement as 
the hypothesis, we need to prove the same statement, with x + 1 in place of z, 
that is, 2l@+!] > [z + 1]?. These are the statements S(x) and S(x + 1) in the 
induction principle; the fact that we are using x instead of n as the parameter 
should not be of concern; x or n is just a local variable. 

As in Theorem 1.16, we should rewrite S(x + 1) so it can make use of S(x). 
In this case, we can write 2[*+1] as 2 x 2*. Since S(x) tells us that 2% > «?, we 
can conclude that 27+! = 2 x 2” > 22?. 

But we need something different; we need to show that 27+! > (g + 1)?. 
One way to prove this statement is to prove that 2x? > (x + 1)? and then use 
the transitivity of > to show 27+! > 2x? > (a + 1)?. In our proof that 


22° > (z +1)? (1.7) 
we may use the assumption that x > 4. Begin by simplifying (1.7): 


r?’ >2gz+1 (1.8) 
Divide (1.8) by x, to get: 
1 
r>2+>— (1.9) 
z 


Since x > 4, we know 1/a < 1/4. Thus, the left side of (1.9) is at least 
4, and the right side is at most 2.25. We have thus proved the truth of (1.9). 
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Integers as Recursively Defined Concepts 


We mentioned that inductive proofs are useful when the subject matter is 
recursively defined. However, our first examples were inductions on inte- 
gers, which we do not normally think of as “recursively defined.” However, 
there is a natural, recursive definition of when a number is a nonnegative 
integer, and this definition does indeed match the way inductions on inte- 
gers proceed: from objects defined first, to those defined later. 


BASIS: 0 is an integer. 


INDUCTION: If n is an integer, then so is n + 1. 


Therefore, Equations (1.8) and (1.7) are also true. Equation (1.7) in turn gives 
us 2g? > (x +1)? for x > 4 and lets us prove statement S(x + 1), which we 
recall was 27+! > (g+ 1)?. 


1.4.2 More General Forms of Integer Inductions 


Sometimes an inductive proof is made possible only by using a more general 
scheme than the one proposed in Section 1.4.1, where we proved a statement S 
for one basis value and then proved that “if S(n) then S(n+1).” Two important 
generalizations of this scheme are: 


1. We can use several basis cases. That is, we prove S(i),S(i + 1),..., S(j) 
for some j >i. 


2. In proving S(n + 1), we can use the truth of all the statements 
S(i), SG +1),...,S(n) 


rather than just using S(n). Moreover, if we have proved basis cases up 
to S(j), then we can assume n > j, rather than just n > i. 


The conclusion to be made from this basis and inductive step is that S(n) is 
true for all n > i. 


Example 1.18: The following example will illustrate the potential of both 
principles. The statement S(n) we would like to prove is that if n > 8, then n 
can be written as a sum of 3’s and 5’s. Notice, incidentally, that 7 cannot be 
written as a sum of 3’s and 5’s. 


BASIS: The basis cases are S(8), S(9), and S(10). The proofs are 8 = 3 + 5, 
9=3+4+3+43, and 10 = 5 + 5, respectively. 
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INDUCTION: Assume that n > 10 and that S(8),.S(9),...,S(n) are true. We 
must prove S(n +1) from these given facts. Our strategy is to subtract 3 from 
n + 1, observe that this number must be writable as a sum of 3’s and 5’s, and 
add one more 3 to the sum to get a way to write n + 1. 

More formally, observe that n — 2 > 8, so we may assume S(n — 2). That 
is, n — 2 = 3a + 5b for some integers a and b. Then n + 1 = 3 + 3a + 5b, so 
n + 1 can be written as the sum of a+ 1 3’s and b 5’s. That proves S(n + 1) 
and concludes the inductive step. 


1.4.3 Structural Inductions 


In automata theory, there are several recursively defined structures about which 
we need to prove statements. The familiar notions of trees and expressions 
are important examples. Like inductions, all recursive definitions have a basis 
case, where one or more elementary structures are defined, and an inductive 
step, where more complex structures are defined in terms of previously defined 
structures. 


Example 1.19: Here is the recursive definition of a tree: 
BASIS: A single node is a tree, and that node is the root of the tree. 
INDUCTION: If 7), 7>,..., 7% are trees, then we can form a new tree as follows: 
1. Begin with a new node N, which is the root of the tree. 
2. Add copies of all the trees T1, 7>,...,T,. 
3. Add edges from node N to the roots of each of the trees Ti, T2,..., Tk- 


Figure 1.7 shows the inductive construction of a tree with root N from k smaller 
trees. 


Figure 1.7: Inductive construction of a tree 


Example 1.20: Here is another recursive definition. This time we define 
expressions using the arithmetic operators + and *, with both numbers and 
variables allowed as operands. 
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Intuition Behind Structural Induction 


We can suggest informally why structural induction is a valid proof 
method. Imagine the recursive definition establishing, one at a time, that 
certain structures X1, X2,... meet the definition. The basis elements come 
first, and the fact that X; is in the defined set of structures can only de- 
pend on the membership in the defined set of structures that precede X; 
on the list. Viewed this way, a structural induction is nothing but an in- 


duction on integer n of the statement S(X,,). This induction may be of 
the generalized form discussed in Section 1.4.2, with multiple basis cases 
and an inductive step that uses all previous instances of the statement. 
However, we should remember, as explained in Section 1.4.1, that this 
intuition is not a formal proof, and in fact we must assume the validity 
of this induction principle as we did the validity of the original induction 
principle of that section. 


BASIS: Any number or letter (i.e., a variable) is an expression. 
INDUCTION: If E and F are expressions, then so are E + F, E x F, and (E). 


For example, both 2 and x are expressions by the basis. The inductive step 
tells us x + 2, (x + 2), and 2 x (x + 2) are all expressions. Notice how each of 
these expressions depends on the previous ones being expressions. 


When we have a recursive definition, we can prove theorems about it using 
the following proof form, which is called structural induction. Let S(X) be a 
statement about the structures X that are defined by some particular recursive 
definition. 


1. As a basis, prove S(X) for the basis structure(s) X. 


2. For the inductive step, take a structure X that the recursive defini- 
tion says is formed from Y,,¥Y2,...,Y,. Assume that the statements 
S(¥1), S(Y2),.--, S(Yk) hold, and use these to prove S(X). 


Our conclusion is that S(X) is true for all X. The next two theorems are 
examples of facts that can be proved about trees and expressions. 


Theorem 1.21: Every tree has one more node than it has edges. 


PROOF: The formal statement S(T) we need to prove by structural induction 
is: “if T is a tree, and T has n nodes and e edges, then n = e+ 1.” 


BASIS: The basis case is when T is a single node. Then n = 1 and e = 0, so 
the relationship n = e + 1 holds. 
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INDUCTION: Let T be a tree built by the inductive step of the definition, 
from root node N and k smaller trees T1, To, ..., Tk. We may assume that the 
statements S(T;) hold for i = 1,2,...,k. That is, let T; have n; nodes and e; 
edges; then n; = e; + 1. 

The nodes of T are node N and all the nodes of the T;’s. There are thus 
1 +ni +n2 +- +np nodes in T. The edges of T are the k edges we added 
explicitly in the inductive definition step, plus the edges of the T;’s. Hence, T 
has 


ktertegt: +e (1.10) 


edges. If we substitute e; + 1 for n; in the count of the number of nodes of T 
we find that T has 


1+ [e1 + 1] + [e2 + 1] +-+: + [ex +1] (1.11) 
nodes. Since there are k of the “+1” terms in (1.11), we can regroup it as: 


k+1l+e +e + +e (1.12) 


This expression is exactly 1 more than the expression of (1.10) that was given 
for the number of edges of T. Thus, T has one more node than it has edges. 


Theorem 1.22: Every expression has an equal number of left and right paren- 
theses. 


PROOF: Formally, we prove the statement S(G) about any expression G that 
is defined by the recursion of Example 1.20: the numbers of left and right 
parentheses in G are the same. 


BASIS: If G is defined by the basis, then G is a number or variable. These 
expressions have 0 left parentheses and 0 right parentheses, so the numbers are 
equal. 


INDUCTION: There are three rules whereby expression G may have been con- 
structed according to the inductive step in the definition: 


1. G=E+F. 
2.G=Ex«F. 
3. G = (E). 


We may assume that S(E) and S(F) are true; that is, E has the same number 
of left and right parentheses, say n of each, and F likewise has the same number 
of left and right parentheses, say m of each. Then we can compute the numbers 
of left and right parentheses in G for each of the three cases, as follows: 
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1. If G = E+ F, then G has n + m left parentheses and n + m right 
parentheses; n of each come from E and m of each come from F. 


2. If G = E x F, the count of parentheses for G is again n + m of each, for 
the same reason as in case (1). 


3. IfG = (E), then there are n+1 left parentheses in G — one left parenthesis 
is explicitly shown, and the other n are present in E. Likewise, there are 
n + 1 right parentheses in G; one is explicit and the other n are in E. 


In each of the three cases, we see that the numbers of left and right parentheses 
in G are the same. This observation completes the inductive step and completes 
the proof. 


1.4.4 Mutual Inductions 


Sometimes, we cannot prove a single statement by induction, but rather need 
to prove a group of statements Sı(n), S2(n),..., Sp(n) together by induction 
on n. Automata theory provides many such situations. In Example 1.23 we 
sample the common situation where we need to explain what an automaton 
does by proving a group of statements, one for each state. These statements 
tell under what sequences of inputs the automaton gets into each of the states. 

Strictly speaking, proving a group of statements is no different from proving 
the conjunction (logical AND) of all the statements. For instance, the group 
of statements S1(n), S2(n),...,S%(n) could be replaced by the single statement 
Sı (n) AND So(n) AND --- AND Sg(n). However, when there are really several inde- 
pendent statements to prove, it is generally less confusing to keep the statements 
separate and to prove them all in their own parts of the basis and inductive 
steps. We call this sort of proof mutual induction. An example will illustrate 
the necessary steps for a mutual recursion. 


Example 1.23: Let us revisit the on/off switch, which we represented as an 
automaton in Example 1.1. The automaton itself is reproduced as Fig. 1.8. 
Since pushing the button switches the state between on and off, and the switch 
starts out in the off state, we expect that the following statements will together 
explain the operation of the switch: 


Sı(n): The automaton is in state off after n pushes if and only if n is even. 
So(n): The automaton is in state on after n pushes if and only if n is odd. 


We might suppose that Sı implies Sə and vice-versa, since we know that 
a number n cannot be both even and odd. However, what is not always true 
about an automaton is that it is in one and only one state. It happens that 
the automaton of Fig. 1.8 is always in exactly one state, but that fact must be 
proved as part of the mutual induction. 
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Push 


Push 


Figure 1.8: Repeat of the automaton of Fig. 1.1 


We give the basis and inductive parts of the proofs of statements Sı (n) and 
S(n) below. The proofs depend on several facts about odd and even integers: 
if we add or subtract 1 from an even integer, we get an odd integer, and if we 
add or subtract 1 from an odd integer we get an even integer. 


BASIS: For the basis, we choose n = 0. Since there are two statements, each of 
which must be proved in both directions (because Sı and Sù are each “if-and- 
only-if” statements), there are actually four cases to the basis, and four cases 
to the induction as well. 


1. [S1; If] Since 0 is in fact even, we must show that after 0 pushes, the 
automaton of Fig. 1.8 is in state off. Since that is the start state, the 
automaton is indeed in state off after 0 pushes. 


2. [S1; Only-if] The automaton is in state off after 0 pushes, so we must 
show that 0 is even. But 0 is even by definition of “even,” so there is 
nothing more to prove. 


3. [S2; If] The hypothesis of the “if” part of Sə is that 0 is odd. Since this 
hypothesis H is false, any statement of the form “if H then C” is true, as 
we discussed in Section 1.3.2. Thus, this part of the basis also holds. 


4. [S2; Only-if] The hypothesis, that the automaton is in state on after 0 
pushes, is also false, since the only way to get to state on is by following 
an arc labeled Push, which requires that the button be pushed at least 
once. Since the hypothesis is false, we can again conclude that the if-then 
statement is true. 


INDUCTION: Now, we assume that S,(n) and S2(n) are true, and try to prove 
Si(n + 1) and So(n+ 1). Again, the proof separates into four parts. 


1. [Si(n + 1); If] The hypothesis for this part is that n + 1 is even. Thus, 
n is odd. The “if” part of statement S2(n) says that after n pushes, the 
automaton is in state on. The arc from on to off labeled Push tells us 
that the (n + 1)st push will cause the automaton to enter state off. That 
completes the proof of the “if” part of Si(n + 1). 


28 


CHAPTER 1. AUTOMATA: THE METHODS AND THE MADNESS 


2. [Si(n + 1); Only-if] The hypothesis is that the automaton is in state off 


after n + 1 pushes. Inspecting the automaton of Fig. 1.8 tells us that the 
only way to get to state off after one or more moves is to be in state on and 
receive an input Push. Thus, if we are in state off after n + 1 pushes, we 
must have been in state on after n pushes. Then, we may use the “only-if” 
part of statement S)(n) to conclude that n is odd. Consequently, n + 1 is 
even, which is the desired conclusion for the only-if portion of Sı(n + 1). 


. [S2(n+1); If] This part is essentially the same as part (1), with the roles of 


statements Sı and Sə exchanged, and with the roles of “odd” and “even” 
exchanged. The reader should be able to construct this part of the proof 
easily. 


. [S2(n + 1); Only-if] This part is essentially the same as part (2), with the 


roles of statements Sı and S2 exchanged, and with the roles of “odd” and 
“even” exchanged. 


We can abstract from Example 1.23 the pattern for all mutual inductions: 


e Each of the statements must be proved separately in the basis and in the 


inductive step. 


e If the statements are “if-and-only-if,” then both directions of each state- 


ment must be proved, both in the basis and in the induction. 


1.5 The Central Concepts of Automata Theory 


In this section we shall introduce the most important definitions of terms that 
pervade the theory of automata. These concepts include the “alphabet” (a set 
of symbols), “strings” (a list of symbols from an alphabet), and “language” (a 
set of strings from the same alphabet). 


1.5.1 Alphabets 


An alphabet is a finite, nonempty set of symbols. Conventionally, we use the 
symbol © for an alphabet. Common alphabets include: 


1. © = {0,1}, the binary alphabet. 
2. © = {a,b,...,z}, the set of all lower-case letters. 


3. The set of all ASCII characters, or the set of all printable ASCII charac- 


ters. 
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1.5.2 Strings 


A string (or sometimes word) is a finite sequence of symbols chosen from some 
alphabet. For example, 01101 is a string from the binary alphabet © = {0, 1}. 
The string 111 is another string chosen from this alphabet. 


The Empty String 


The empty string is the string with zero occurrences of symbols. This string, 
denoted e, is a string that may be chosen from any alphabet whatsoever. 


Length of a String 


It is often useful to classify strings by their length, that is, the number of 
positions for symbols in the string. For instance, 01101 has length 5. It is 
common to say that the length of a string is “the number of symbols” in the 
string; this statement is colloquially accepted but not strictly correct. Thus, 
there are only two symbols, 0 and 1, in the string 01101, but there are five 
positions for symbols, and its length is 5. However, you should generally expect 
that “the number of symbols” can be used when “number of positions” is meant. 

The standard notation for the length of a string w is |w|. For example, 
|011| = 3 and |e| = 0. 


Powers of an Alphabet 


If © is an alphabet, we can express the set of all strings of a certain length from 
that alphabet by using an exponential notation. We define 5* to be the set of 
strings of length k, each of whose symbols is in ©. 


Example 1.24: Note that X° = {e}, regardless of what alphabet ¥ is. That 
is, € is the only string whose length is 0. 
If © = {0,1}, then X! = {0,1}, £? = {00, 01, 10, 11}, 


53 = {000, 001,010, 011, 100, 101, 110, 111} 


and so on. Note that there is a slight confusion between © and &'. The former 
is an alphabet; its members 0 and 1 are symbols. The latter is a set of strings; 
its members are the strings 0 and 1, each of which is of length 1. We shall not 
try to use separate notations for the two sets, relying on context to make it 
clear whether {0,1} or similar sets are alphabets or sets of strings. 


The set of all strings over an alphabet © is conventionally denoted &*. For 
instance, {0,1}* = {e,0, 1,00, 01, 10, 11,000,...}. Put another way, 


yt=r°ustunu.-.- 


Sometimes, we wish to exclude the empty string from the set of strings. The 
set of nonempty strings from alphabet ¥ is denoted £+. Thus, two appropriate 
equivalences are: 
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Type Convention for Symbols and Strings 


Commonly, we shall use lower-case letters at the beginning of the alphabet 
(or digits) to denote symbols, and lower-case letters near the end of the 
alphabet, typically w, x, y, and z, to denote strings. You should try to get 
used to this convention, to help remind you of the types of the elements 
being discussed. 


estas usmruxeu... 


e »* = te}. 


Concatenation of Strings 


Let x and y be strings. Then zy denotes the concatenation of x and y, that 
is, the string formed by making a copy of x and following it by a copy of y. 
More precisely, if x is the string composed of i symbols xz = a,a2---a; and y is 
the string composed of j symbols y = bıbə bj, then zy is the string of length 
i+ j: LY = @1 G2 +++ ajby bz +++ bj. 


Example 1.25: Let x = 01101 and y = 110. Then zy = 01101110 and 
yx = 11001101. For any string w, the equations ew = we = w hold. That is, 
c is the identity for concatenation, since when concatenated with any string it 
yields the other string as a result (analogously to the way 0, the identity for 
addition, can be added to any number zx and yields x as a result). 


1.5.3 Languages 


A set of strings all of which are chosen from some }*, where Y is a particular 
alphabet, is called a language. If X is an alphabet, and L C &*, then Lisa 
language over %. Notice that a language over © need not include strings with 
all the symbols of ©, so once we have established that L is a language over ©, 
we also know it is a language over any alphabet that is a superset of X. 

The choice of the term “language” may seem strange. However, common 
languages can be viewed as sets of strings. An example is English, where the 
collection of legal English words is a set of strings over the alphabet that consists 
of all the letters. Another example is C, or any other programming language, 
where the legal programs are a subset of the possible strings that can be formed 
from the alphabet of the language. This alphabet is a subset of the ASCII 
characters. The exact alphabet may differ slightly among different programming 
languages, but generally includes the upper- and lower-case letters, the digits, 
punctuation, and mathematical symbols. 

However, there are also many other languages that appear when we study 
automata. Some are abstract examples, such as: 
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1. The language of all strings consisting of n 0’s followed by n 1’s, for some 
n > 0: {e,01,0011, 000111,...}. 


2. The set of strings of 0’s and 1’s with an equal number of each: 


fe,01, 10,0011, 0101, 1001, . . } 


3. The set of binary numbers whose value is a prime: 


{10, 11,101, 111,1011,...} 
4. ©* is a language for any alphabet ©. 
5. 0, the empty language, is a language over any alphabet. 


6. {e}, the language consisting of only the empty string, is also a language 
over any alphabet. Notice that Ø 4 {e}; the former has no strings and 
the latter has one string. 


The only important constraint on what can be a language is that all alphabets 
are finite. Thus languages, although they can have an infinite number of strings, 
are restricted to consist of strings drawn from one fixed, finite alphabet. 


1.5.4 Problems 


In automata theory, a problem is the question of deciding whether a given string 
is a member of some particular language. It turns out, as we shall see, that 
anything we more colloquially call a “problem” can be expressed as membership 
in a language. More precisely, if © is an alphabet, and L is a language over ©, 
then the problem L is: 


e Given a string w in %*, decide whether or not w is in L. 


Example 1.26: The problem of testing primality can be expressed by the 
language L, consisting of all binary strings whose value as a binary number is 
a prime. That is, given a string of 0’s and 1’s, say “yes” if the string is the 
binary representation of a prime and say “no” if not. For some strings, this 
decision is easy. For instance, 0011101 cannot be the representation of a prime, 
for the simple reason that every integer except 0 has a binary representation 
that begins with 1. However, it is less obvious whether the string 11101 belongs 
to Lp, so any solution to this problem will have to use significant computational 
resources of some kind: time and/or space, for example. 


One potentially unsatisfactory aspect of our definition of “problem” is that 
one commonly thinks of problems not as decision questions (is or is not the 
following true?) but as requests to compute or transform some input (find the 
best way to do this task). For instance, the task of the parser in a C compiler 
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Set-Formers as a Way to Define Languages 


It is common to describe a language using a “set-former”: 
{w | something about w} 


This expression is read “the set of words w such that (whatever is said 
about w to the right of the vertical bar).” Examples are: 


1. {w | w consists of an equal number of 0’s and 1’s }. 
2. {w | w is a binary integer that is prime }. 
3. {w | w is a syntactically correct C program }. 


It is also common to replace w by some expression with parameters and 
describe the strings in the language by stating conditions on the parame- 
ters. Here are some examples; the first with parameter n, the second with 
parameters i and j: 


1. {0"1" |n > 1}. Read “the set of 0 to the n 1 to the n such that n 
is greater than or equal to 1,” this language consists of the strings 
{01, 0011, 000111,...}. Notice that, as with alphabets, we can raise 
a single symbol to a power n in order to represent n copies of that 
symbol. 


. {01 |0 <i < j}. This language consists of strings with some 0s 
(possibly none) followed by at least as many 1’s. 


can be thought of as a problem in our formal sense, where one is given an ASCII 
string and asked to decide whether or not the string is a member of Le, the set 
of valid C programs. However, the parser does more than decide. It produces a 
parse tree, entries in a symbol table and perhaps more. Worse, the compiler as 
a whole solves the problem of turning a C program into object code for some 
machine, which is far from simply answering “yes” or “no” about the validity 
of a program. 


Nevertheless, the definition of “problems” as languages has stood the test 
of time as the appropriate way to deal with the important questions of com- 
plexity theory. In this theory, we are interested in proving lower bounds on 
the complexity of certain problems. Especially important are techniques for 
proving that certain problems cannot be solved in an amount of time that is 
less than exponential in the size of their input. It turns out that the yes/no 
or language-based version of known problems are just as hard in this sense, as 
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Is It a Language or a Problem? 


Languages and problems are really the same thing. Which term we prefer 
to use depends on our point of view. When we care only about strings for 
their own sake, e.g., in the set {0"1” | n > 1}, then we tend to think of 
the set of strings as a language. In the last chapters of this book, we shall 
tend to assign “semantics” to the strings, e.g., think of strings as coding 
graphs, logical expressions, or even integers. In those cases, where we care 
more about the thing represented by the string than the string itself, we 
shall tend to think of a set of strings as a problem. 


their “solve this” versions. 

That is, if we can prove it is hard to decide whether a given string belongs to 
the language Lx of valid strings in programming language X, then it stands to 
reason that it will not be easier to translate programs in language X to object 
code. For if it were easy to generate code, then we could run the translator, and 
conclude that the input was a valid member of Lx exactly when the translator 
succeeded in producing object code. Since the final step of determining whether 
object code has been produced cannot be hard, we can use the fast algorithm 
for generating the object code to decide membership in Ly efficiently. We thus 
contradict the assumption that testing membership in Lx is hard. We have a 
proof by contradiction of the statement “if testing membership in Lx is hard, 
then compiling programs in programming language X is hard.” 

This technique, showing one problem hard by using its supposed efficient 
algorithm to solve efficiently another problem that is already known to be hard, 
is called a “reduction” of the second problem to the first. It is an essential tool 
in the study of the complexity of problems, and it is facilitated greatly by our 
notion that problems are questions about membership in a language, rather 
than more general kinds of questions. 


1.6 Summary of Chapter 1 


+ Finite Automata: Finite automata involve states and transitions among 
states in response to inputs. They are useful for building several different 
kinds of software, including the lexical analysis component of a compiler 
and systems for verifying the correctness of circuits or protocols, for ex- 
ample. 


+ Regular Expressions: These are a structural notation for describing the 
same patterns that can be represented by finite automata. They are used 
in many common types of software, including tools to search for patterns 
in text or in file names, for instance. 
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Context-Free Grammars: These are an important notation for describing 
the structure of programming languages and related sets of strings; they 
are used to build the parser component of a compiler. 


Turing Machines: These are automata that model the power of real com- 
puters. They allow us to study decidabilty, the question of what can or 
cannot be done by a computer. They also let us distinguish tractable 
problems — those that can be solved in polynomial time — from the 
intractable problems — those that cannot. 


Deductive Proofs: This basic method of proof proceeds by listing state- 
ments that are either given to be true, or that follow logically from some 
of the previous statements. 


Proving If-Then Statements: Many theorems are of the form “if (some- 
thing) then (something else).” The statement or statements following the 
“if” are the hypothesis, and what follows “then” is the conclusion. Deduc- 
tive proofs of if-then statements begin with the hypothesis, and continue 
with statements that follow logically from the hypothesis and previous 
statements, until the conclusion is proved as one of the statements. 


Proving If-And-Only-If Statements: There are other theorems of the form 
“(something) if and only if (something else).” They are proved by showing 
if-then statements in both directions. A similar kind of theorem claims 
the equality of the sets described in two different ways; these are proved 
by showing that each of the two sets is contained in the other. 


Proving the Contrapositive: Sometimes, it is easier to prove a statement 
of the form “if H then C” by proving the equivalent statement: “if not 
C then not H.” The latter is called the contrapositive of the former. 


Proof by Contradiction: Other times, it is more convenient to prove the 
statement “if H then C” by proving “if H and not C then (something 
known to be false).” A proof of this type is called proof by contradiction. 


Counterexamples: Sometimes we are asked to show that a certain state- 
ment is not true. If the statement has one or more parameters, then we 
can show it is false as a generality by providing just one counterexam- 
ple, that is, one assignment of values to the parameters that makes the 
statement false. 


Inductive Proofs: A statement that has an integer parameter n can often 
be proved by induction on n. We prove the statement is true for the 
basis, a finite number of cases for particular values of n, and then prove 
the inductive step: that if the statement is true for values up to n, then 
it is true for n + 1. 
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+ Structural Inductions: In some situations, including many in this book, 
the theorem to be proved inductively is about some recursively defined 
construct, such as trees. We may prove a theorem about the constructed 
objects by induction on the number of steps used in its construction. This 
type of induction is referred to as structural. 


+ Alphabets: An alphabet is any finite set of symbols. 
+ Strings: A string is a finite-length sequence of symbols. 


+ Languages and Problems: A language is a (possibly infinite) set of strings, 
all of which choose their symbols from some one alphabet. When the 
strings of a language are to be interpreted in some way, the question of 
whether a string is in the language is sometimes called a problem. 


1.7 Gradiance Problems for Chapter 1 


The following is a sample of problems that are available on-line through the 
Gradiance system at www.gradiance.com/pearson. Each of these problems 
is worked like conventional homework. The Gradiance system gives you four 
choices that sample your knowledge of the solution. If you make the wrong 
choice, you are given a hint or advice and encouraged to try the same problem 
again. 


Problem 1.1: Find in the list below the expression that is the contrapositive of 
A AND (NOT B) > C OR (NOT D). Note: the hypothesis and conclusion 
of the choices in the list below may have some simple logical rules applied to 
them, in order to simplify the expressions. 


Problem 1.2: To prove A AND (NOT B) + C OR (NOT D) by contra- 
diction, which of the statements below would we prove? Note: each of the 
choices is simplified by pushing NOT’s down until they apply only to atomic 
statements A through D. 


Problem 1.3: Suppose we want to prove the statement S(n): “If n > 2, the 
sum of the integers 2 through n is (n + 2)(n — 1)/2” by induction on n. To 
prove the inductive step, we can make use of the fact that 


2+3+4+...+(n+1)=(2+3+4+...+n)+(n+1) 


Find, in the list below an equality that we may prove to conclude the inductive 
part. 


Problem 1.4: The length of the string X [shown on-line by the Gradiance 
system from a stock of choices] is: 


Problem 1.5: What is the concatenation of X and Y? [strings shown on-line 
by the Gradiance system from a stock of choices] 
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Problem 1.6: The binary string X [shown on-line by the Gradiance system] 
is a member of which of the following problems? Remember, a “problem” is a 
language whose strings represent the cases of a problem that have the answer 
“ves.” In this question, you should assume that all languages are sets of binary 
strings interpreted as base-2 integers. The exception is the problem of finding 
palindromes, which are strings that are identical when reversed, like 0110110, 
regardless of their numerical value. 


1.8 References for Chapter 1 


For extended coverage of the material of this chapter, including mathematical 
concepts underlying Computer Science, we recommend [1]. 


1. A. V. Aho and J. D. Ullman, Foundations of Computer Science, Computer 
Science Press, New York, 1994. 


Chapter 2 


Finite Automata 


This chapter introduces the class of languages known as “regular languages.” 
These languages are exactly the ones that can be described by finite automata, 
which we sampled briefly in Section 1.1.1. After an extended example that will 
provide motivation for the study to follow, we define finite automata formally. 

As was mentioned earlier, a finite automaton has a set of states, and its 
“control” moves from state to state in response to external “inputs.” One of 
the crucial distinctions among classes of finite automata is whether that con- 
trol is “deterministic,” meaning that the automaton cannot be in more than 
one state at any one time, or “nondeterministic,” meaning that it may be in 
several states at once. We shall discover that adding nondeterminism does 
not let us define any language that cannot be defined by a deterministic finite 
automaton, but there can be substantial efficiency in describing an application 
using a nondeterministic automaton. In effect, nondeterminism allows us to 
“program” solutions to problems using a higher-level language. The nondeter- 
ministic finite automaton is then “compiled,” by an algorithm we shall learn 
in this chapter, into a deterministic automaton that can be “executed” on a 
conventional computer. 

We conclude the chapter with a study of an extended nondeterministic aut- 
omaton that has the additional choice of making a transition from one state to 
another spontaneously, i.e., on the empty string as “input.” These automata 
also accept nothing but the regular languages. However, we shall find them 
quite important in Chapter 3, when we study regular expressions and their 
equivalence to automata. 

The study of the regular languages continues in Chapter 3. There, we in- 
troduce another important way to describe regular languages: the algebraic 
notation known as regular expressions. After discussing regular expressions, 
and showing their equivalence to finite automata, we use both automata and 
regular expressions as tools in Chapter 4 to show certain important properties 
of the regular languages. Examples of such properties are the “closure” proper- 
ties, which allow us to claim that one language is regular because one or more 
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other languages are known to be regular, and “decision” properties. The latter 
are algorithms to answer questions about automata or regular expressions, e.g., 
whether two automata or expressions represent the same language. 


2.1 An Informal Picture of Finite Automata 


In this section, we shall study an extended example of a real-world problem 
whose solution uses finite automata in an important role. We investigate pro- 
tocols that support “electronic money” — files that a customer can use to pay 
for goods on the internet, and that the seller can receive with assurance that 
the “money” is real. The seller must know that the file has not been forged, 
nor has it been copied and sent to the seller, while the customer retains a copy 
of the same file to spend again. 

The nonforgeability of the file is something that must be assured by a bank 
and by a cryptography policy. That is, a third player, the bank, must issue and 
encrypt the “money” files, so that forgery is not a problem. However, the bank 
has a second important job: it must keep a database of all the valid money 
that it has issued, so that it can verify to a store that the file it has received 
represents real money and can be credited to the store’s account. We shall not 
address the cryptographic aspects of the problem, nor shall we worry about 
how the bank can store and retrieve what could be billions of “electronic dollar 
bills.” These problems are not likely to represent long-term impediments to the 
concept of electronic money, and examples of its small-scale use have existed 
since the late 1990’s. 

However, in order to use electronic money, protocols need to be devised to 
allow the manipulation of the money in a variety of ways that the users want. 
Because monetary systems always invite fraud, we must verify whatever policy 
we adopt regarding how money is used. That is, we need to prove the only 
things that can happen are things we intend to happen — things that do not 
allow an unscrupulous user to steal from others or to “manufacture” money. 
In the balance of this section, we shall introduce a very simple example of a 
(bad) electronic-money protocol, model it with finite automata, and show how 
constructions on automata can be used to verify protocols (or, in this case, to 
discover that the protocol has a bug). 


2.1.1 The Ground Rules 


There are three participants: the customer, the store, and the bank. We assume 
for simplicity that there is only one “money” file in existence. The customer 
may decide to transfer this money file to the store, which will then redeem the 
file from the bank (i.e., get the bank to issue a new money file belonging to the 
store rather than the customer) and ship goods to the customer. In addition, 
the customer has the option to cancel the file. That is, the customer may ask 
the bank to place the money back in the customer’s account, making the money 
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no longer spendable. Interaction among the three participants is thus limited 
to five events: 


1. The customer may decide to pay. That is, the customer sends the money 
to the store. 


2. The customer may decide to cancel. The money is sent to the bank with 
a message that the value of the money is to be added to the customer’s 
bank account. 


3. The store may ship goods to the customer. 


4. The store may redeem the money. That is, the money is sent to the bank 
with a request that its value be given to the store. 


5. The bank may transfer the money by creating a new, suitably encrypted 
money file and sending it to the store. 


2.1.2 The Protocol 


The three participants must design their behaviors carefully, or the wrong things 
may happen. In our example, we make the reasonable assumption that the 
customer cannot be relied upon to act responsibly. In particular, the customer 
may try to copy the money file, use it to pay several times, or both pay and 
cancel the money, thus getting the goods “for free.” 

The bank must behave responsibly, or it cannot be a bank. In particular, it 
must make sure that two stores cannot both redeem the same money file, and 
it must not allow money to be both canceled and redeemed. The store should 
be careful as well. In particular, it should not ship goods until it is sure it has 
been given valid money for the goods. 

Protocols of this type can be represented as finite automata. Each state 
represents a situation that one of the participants could be in. That is, the state 
“remembers” that certain important events have happened and that others have 
not yet happened. Transitions between states occur when one of the five events 
described above occur. We shall think of these events as “external” to the 
automata representing the three participants, even though each participant is 
responsible for initiating one or more of the events. It turns out that what is 
important about the problem is what sequences of events can happen, not who 
is allowed to initiate them. 

Figure 2.1 represents the three participants by automata. In that diagram, 
we show only the events that affect a participant. For example, the action pay 
affects only the customer and store. The bank does not know that the money 
has been sent by the customer to the store; it discovers that fact only when the 
store executes the action redeem. 

Let us examine first the automaton (c) for the bank. The start state is 
state 1; it represents the situation where the bank has issued the money file in 
question but has not been requested either to redeem it or to cancel it. If a 
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Figure 2.1: Finite automata representing a customer, a store, and a bank 


cancel request is sent to the bank by the customer, then the bank restores the 
money to the customer’s account and enters state 2. The latter state represents 
the situation where the money has been cancelled. The bank, being responsible, 
will not leave state 2 once it is entered, since the bank must not allow the same 
money to be cancelled again or spent by the customer.! 

Alternatively, when in state 1 the bank may receive a redeem request from 
the store. If so, it goes to state 3, and shortly sends the store a transfer message, 
with a new money file that now belongs to the store. After sending the transfer 
message, the bank goes to state 4. In that state, it will neither accept cancel or 
redeem requests nor will it perform any other actions regarding this particular 
money file. 

Now, let us consider Fig. 2.1(a), the automaton representing the actions of 
the store. While the bank always does the right thing, the store’s system has 
some defects. Imagine that the shipping and financial operations are done by 
separate processes, so there is the opportunity for the ship action to be done 
either before, after, or during the redemption of the electronic money. That 
policy allows the store to get into a situation where it has already shipped the 
goods and then finds out the money was bogus. 

The store starts out in state a. When the customer orders the goods by 


1You should remember that this entire discussion is about one single money file. The bank 
will in fact be running the same protocol with a large number of electronic pieces of money, 
but the workings of the protocol are the same for each of them, so we can discuss the problem 
as if there were only one piece of electronic money in existence. 
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performing the pay action, the store enters state b. In this state, the store 
begins both the shipping and redemption processes. If the goods are shipped 
first, then the store enters state c, where it must still redeem the money from 
the bank and receive the transfer of an equivalent money file from the bank. 
Alternatively, the store may send the redeem message first, entering state d. 
From state d, the store might next ship, entering state e, or it might next 
receive the transfer of money from the bank, entering state f. From state f, we 
expect that the store will eventually ship, putting the store in state g, where the 
transaction is complete and nothing more will happen. In state e, the store is 
waiting for the transfer from the bank. Unfortunately, the goods have already 
been shipped, and if the transfer never occurs, the store is out of luck. 

Last, observe the automaton for the customer, Fig. 2.1(b). This automaton 
has only one state, reflecting the fact that the customer “can do anything.” 
The customer can perform the pay and cancel actions any number of times, in 
any order, and stays in the lone state after each action. 


2.1.3 Enabling the Automata to Ignore Actions 


While the three automata of Fig. 2.1 reflect the behaviors of the three partici- 
pants independently, there are certain transitions that are missing. For example, 
the store is not affected by a cancel message, so if the cancel action is performed 
by the customer, the store should remain in whatever state it is in. However, in 
the formal definition of a finite automaton, which we shall study in Section 2.2, 
whenever an input X is received by an automaton, the automaton must follow 
an arc labeled X from the state it is in to some new state. Thus, the automaton 
for the store needs an additional arc from each state to itself, labeled cancel. 
Then, whenever the cancel action is executed, the store automaton can make a 
“transition” on that input, with the effect that it stays in the same state it was 
in. Without these additional arcs, whenever the cancel action was executed the 
store automaton would “die”; that is, the automaton would be in no state at 
all, and further actions by that automaton would be impossible. 

Another potential problem is that one of the participants may, intentionally 
or erroneously, send an unexpected message, and we do not want this action to 
cause one of the automata to die. For instance, suppose the customer decided 
to execute the pay action a second time, while the store was in state e. Since 
that state has no arc out with label pay, the store’s automaton would die before 
it could receive the transfer from the bank. In summary, we must add to the 
automata of Fig. 2.1 loops on certain states, with labels for all those actions 
that must be ignored when in that state; the complete automata are shown 
in Fig. 2.2. To save space, we combine the labels onto one arc, rather than 
showing several arcs with the same heads and tails but different labels. The 
two kinds of actions that must be ignored are: 


1. Actions that are irrelevant to the participant involved. As we saw, the 
only irrelevant action for the store is cancel, so each of its seven states 
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Figure 2.2: The complete sets of transitions for the three automata 


has a loop labeled cancel. For the bank, both pay and ship are irrelevant, 
so we have put at each of the bank’s states an arc labeled pay, ship. For 
the customer, ship, redeem and transfer are all irrelevant, so we add arcs 
with these labels. In effect, it stays in its one state on any sequence of 
inputs, so the customer automaton has no effect on the operation of the 
overall system. Of course, the customer is still a participant, since it is 
the customer who initiates the pay and cancel actions. However, as we 
mentioned, the matter of who initiates actions has nothing to do with the 
behavior of the automata. 


. Actions that must not be allowed to kill an automaton. As mentioned, we 


must not allow the customer to kill the store’s automaton by executing pay 
again, so we have added loops with label pay to all but state a (where the 
pay action is expected and relevant). We have also added loops with labels 
cancel to states 3 and 4 of the bank, in order to prevent the customer from 
killing the bank’s automaton by trying to cancel money that has already 
been redeemed. The bank properly ignores such a request. Likewise, 
states 3 and 4 have loops on redeem. The store should not try to redeem 
the same money twice, but if it does, the bank properly ignores the second 
request. 
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2.1.4 The Entire System as an Automaton 


While we now have models for how the three participants behave, we do not 
yet have a representation for the interaction of the three participants. As men- 
tioned, because the customer has no constraints on behavior, that automaton 
has only one state, and any sequence of events lets it stay in that state; i.e., it is 
not possible for the system as a whole to “die” because the customer automaton 
has no response to an action. However, both the store and bank behave in a 
complex way, and it is not immediately obvious in what combinations of states 
these two automata can be. 

The normal way to explore the interaction of automata such as these is to 
construct the product automaton. That automaton’s states represent a pair of 
states, one from the store and one from the bank. For instance, the state (3, d) 
of the product automaton represents the situation where the bank is in state 
3, and the store is in state d. Since the bank has four states and the store has 
seven, the product automaton has 4 x 7 = 28 states. 

We show the product automaton in Fig. 2.3. For clarity, we have arranged 
the 28 states in an array. The row corresponds to the state of the bank and 
the column to the state of the store. To save space, we have also abbreviated 
the labels on the arcs, with P, S, C, R, and T standing for pay, ship, cancel, 
redeem, and transfer, respectively. 


d e 
Qs 


C C C C 
R 
S S 


P4) pL) P4 Phd 


Figure 2.3: The product automaton for the store and bank 


To construct the arcs of the product automaton, we need to run the bank 
and store automata “in parallel.” Each of the two components of the product 
automaton independently makes transitions on the various inputs. However, it 
is important to notice that if an input action is received, and one of the two 
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automata has no state to go to on that input, then the product automaton 
“dies”; it has no state to go to. 

To make this rule for state transitions precise, suppose the product automa- 
ton is in state (i, x). That state corresponds to the situation where the bank 
is in state 7 and the store in state x. Let Z be one of the input actions. We 
look at the automaton for the bank, and see whether there is a transition out 
of state i with label Z. Suppose there is, and it leads to state 7 (which might 
be the same as 7 if the bank loops on input Z). Then, we look at the store and 
see if there is an arc labeled Z leading to some state y. If both j and y exist, 
then the product automaton has an arc from state (i, x) to state (j, y), labeled 
Z. If either of states j or y do not exist (because the bank or store has no arc 
out of i or x, respectively, for input Z), then there is no arc out of (i, x) labeled 
Z. 

We can now see how the arcs of Fig. 2.3 were selected. For instance, on 
input pay, the store goes from state a to b, but stays put if it is in any other 
state besides a. The bank stays in whatever state it is in when the input is 
pay, because that action is irrelevant to the bank. This observation explains 
the four arcs labeled P at the left ends of the four rows in Fig. 2.3, and the 
loops labeled P on other states. 

For another example of how the arcs are selected, consider the input redeem. 
If the bank receives a redeem message when in state 1, it goes to state 3. If in 
states 3 or 4, it stays there, while in state 2 the bank automaton dies; i.e., it has 
nowhere to go. The store, on the other hand, can make transitions from state 
b to d or from c to e when the redeem input is received. In Fig. 2.3, we see six 
arcs labeled redeem, corresponding to the six combinations of three bank states 
and two store states that have outward-bound arcs labeled R. For example, in 
state (1, b), the arc labeled R takes the automaton to state (3, d), since redeem 
takes the bank from state 1 to 3 and the store from b to d. As another example, 
there is an arc labeled R from (4,c) to (4,e), since redeem takes the bank from 
state 4 back to state 4, while it takes the store from state c to state e. 


2.1.5 Using the Product Automaton to Validate the 
Protocol 


Figure 2.3 tells us some interesting things. For instance, of the 28 states, only 
ten of them can be reached from the start state, which is (1,a) — the combi- 
nation of the start states of the bank and store automata. Notice that states 
like (2,e) and (4, d) are not accessible, that is, there is no path to them from 
the start state. Inaccessible states need not be included in the automaton, and 
we did so in this example just to be systematic. 

However, the real purpose of analyzing a protocol such as this one using 
automata is to ask and answer questions that mean “can the following type 
of error occur?” In the example at hand, we might ask whether it is possible 
that the store can ship goods and never get paid. That is, can the product 
automaton get into a state in which the store has shipped (that is, the state is 
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in column cç, e, or g), and yet no transition on input T was ever made or will 
be made? 

For instance, in state (3,e), the goods have shipped, but there will eventu- 
ally be a transition on input T to state (4,g). In terms of what the bank is 
doing, once it has gotten to state 3, it has received the redeem request and pro- 
cessed it. That means it must have been in state 1 before receiving the redeem 
and therefore the cancel message had not been received and will be ignored if 
received in the future. Thus, the bank will eventually perform the transfer of 
money to the store. 

However, state (2,c) is a problem. The state is accessible, but the only arc 
out leads back to that state. This state corresponds to a situation where the 
bank received a cancel message before a redeem message. However, the store 
received a pay message; i.e., the customer was being duplicitous and has both 
spent and canceled the same money. The store foolishly shipped before trying 
to redeem the money, and when the store does execute the redeem action, the 
bank will not even acknowledge the message, because it is in state 2, where it 
has canceled the money and will not process a redeem request. 


2.2 Deterministic Finite Automata 


Now it is time to present the formal notion of a finite automaton, so that we 
may start to make precise some of the informal arguments and descriptions that 
we saw in Sections 1.1.1 and 2.1. We begin by introducing the formalism of a 
deterministic finite automaton, one that is in a single state after reading any 
sequence of inputs. The term “deterministic” refers to the fact that on each 
input there is one and only one state to which the automaton can transition from 
its current state. In contrast, “nondeterministic” finite automata, the subject of 
Section 2.3, can be in several states at once. The term “finite automaton” will 
refer to the deterministic variety, although we shall use “deterministic” or the 
abbreviation DFA normally, to remind the reader of which kind of automaton 
we are talking about. 


2.2.1 Definition of a Deterministic Finite Automaton 
A deterministic finite automaton consists of: 

1. A finite set of states, often denoted Q. 

2. A finite set of input symbols, often denoted ©. 


3. A transition function that takes as arguments a state and an input symbol 
and returns a state. The transition function will commonly be denoted ô. 
In our informal graph representation of automata, ô was represented by 
arcs between states and the labels on the arcs. If q is a state, and a is an 
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input symbol, then 6(q, a) is that state p such that there is an arc labeled 
a from q to p.? 


4. A start state, one of the states in Q. 
5. A set of final or accepting states F. The set F is a subset of Q. 


A deterministic finite automaton will often be referred to by its acronym: DFA. 
The most succinct representation of a DFA is a listing of the five components 
above. In proofs we often talk about a DFA in “five-tuple” notation: 


A= (Q, £, ð, qo, F) 


where A is the name of the DFA, Q is its set of states, © its input symbols, 6 
its transition function, qo its start state, and F its set of accepting states. 


2.2.2 How a DFA Processes Strings 


The first thing we need to understand about a DFA is how the DFA decides 
whether or not to “accept” a sequence of input symbols. The “language” of 
the DFA is the set of all strings that the DFA accepts. Suppose aias: an is a 
sequence of input symbols. We start out with the DFA in its start state, qo. We 
consult the transition function 6, say 6(qo,a1) = qı to find the state that the 
DFA A enters after processing the first input symbol a,. We process the next 
input symbol, a2, by evaluating (qı, a2); let us suppose this state is q2. We 
continue in this manner, finding states q3,q4,.--,@n such that 6(qj-1,a;) = qi 
for each i. If qn is a member of F, then the input a,a2---a, is accepted, and 
if not then it is “rejected.” 


Example 2.1: Let us formally specify a DFA that accepts all and only the 
strings of 0’s and 1’s that have the sequence 01 somewhere in the string. We 
can write this language L as: 


{w | w is of the form x01ly for some strings 
x and y consisting of 0’s and 1’s only} 


Another equivalent description, using parameters x and y to the left of the 
vertical bar, is: 


{x01y | x and y are any strings of 0’s and 1’s} 


Examples of strings in the language include 01, 11010, and 100011. Examples 
of strings not in the language include €, 0, and 111000. 

What do we know about an automaton that can accept this language L? 
First, its input alphabet is © = {0,1}. It has some set of states, Q, of which 
one, say qo, is the start state. This automaton has to remember the important 
facts about what inputs it has seen so far. To decide whether 01 is a substring 
of the input, A needs to remember: 


?More accurately, the graph is a picture of some transition function 6, and the arcs of the 
graph are constructed to reflect the transitions specified by 6. 
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1. Has it already seen 01? If so, then it accepts every sequence of further 
inputs; i.e., it will only be in accepting states from now on. 


2. Has it never seen 01, but its most recent input was 0, so if it now sees a 
1, it will have seen 01 and can accept everything it sees from here on? 


3. Has it never seen 01, but its last input was either nonexistent (it just 
started) or it last saw a 1? In this case, A cannot accept until it first sees 
a 0 and then sees a 1 immediately after. 


These three conditions can each be represented by a state. Condition (3) is 
represented by the start state, qo. Surely, when just starting, we need to see 
a 0 and then a 1. But if in state gg we next see a 1, then we are no closer to 
seeing 01, and so we must stay in state qo. That is, ô(qo, 1) = qo. 

However, if we are in state go and we next see a 0, we are in condition (2). 
That is, we have never seen 01, but we have our 0. Thus, let us use q2 to 
represent condition (2). Our transition from qo on input 0 is (qo, 0) = q2. 

Now, let us consider the transitions from state q2. If we see a 0, we are no 
better off than we were, but no worse either. We have not seen 01, but 0 was 
the last symbol, so we are still waiting for a 1. State q2 describes this situation 
perfectly, so we want ô(q2,0) = q2. If we are in state q2 and we see a 1 input, 
we now know there is a 0 followed by a 1. We can go to an accepting state, 
which we shall call qı, and which corresponds to condition (1) above. That is, 
6(q2,1) = q. 

Finally, we must design the transitions for state qı. In this state, we have 
already seen a 01 sequence, so regardless of what happens, we shall still be in 
a situation where we’ve seen 01. That is, 6(q,0) = (q1, 1) = qı. 

Thus, Q = {q0,q1,q2}. As we said, go is the start state, and the only 
accepting state is qı; that is, F = {q1}. The complete specification of the 
automaton A that accepts the language L of strings that have a 01 substring, 
is 


A= ({q0, q, q2}, {0, 1}, ô, qo, {a} 


where ô is the transition function described above. 


2.2.3 Simpler Notations for DFA’s 


Specifying a DFA as a five-tuple with a detailed description of the 6 transition 
function is both tedious and hard to read. There are two preferred notations 
for describing automata: 


1. A transition diagram, which is a graph such as the ones we saw in Sec- 
tion 2.1. 


2. A transition table, which is a tabular listing of the 6 function, which by 
implication tells us the set of states and the input alphabet. 


48 CHAPTER 2. FINITE AUTOMATA 


Transition Diagrams 


A transition diagram for a DFA A = (Q, ©, ô, qo, F) is a graph defined as follows: 
a) For each state in Q there is a node. 


b) For each state q in Q and each input symbol a in X, let 6(q,a) = p. 
Then the transition diagram has an arc from node q to node p, labeled 
a. If there are several input symbols that cause transitions from q to p, 
then the transition diagram can have one arc, labeled by the list of these 
symbols. 


c) There is an arrow into the start state qo, labeled Start. This arrow does 
not originate at any node. 


d) Nodes corresponding to accepting states (those in F) are marked by a 
double circle. States not in F have a single circle. 


Example 2.2: Figure 2.4 shows the transition diagram for the DFA that we 
designed in Example 2.1. We see in that diagram the three nodes that cor- 
respond to the three states. There is a Start arrow entering the start state, 
qo, and the one accepting state, q1, is represented by a double circle. Out of 
each state is one arc labeled 0 and one arc labeled 1 (although the two arcs 
are combined into one with a double label in the case of qı). The arcs each 
correspond to one of the 6 facts developed in Example 2.1. 


0 
ERN 0 GZ 1 @N i 


Figure 2.4: The transition diagram for the DFA accepting all strings with a 
substring 01 


Transition Tables 


A transition table is a conventional, tabular representation of a function like 6 
that takes two arguments and returns a value. The rows of the table correspond 
to the states, and the columns correspond to the inputs. The entry for the row 
corresponding to state q and the column corresponding to input a is the state 


ô(q,a). 


Example 2.3: The transition table corresponding to the function 6 of Ex- 
ample 2.1 is shown in Fig. 2.5. We have also shown two other features of a 
transition table. The start state is marked with an arrow, and the accepting 
states are marked with a star. Since we can deduce the sets of states and in- 
put symbols by looking at the row and column heads, we can now read from 
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the transition table all the information we need to specify the finite automaton 
uniquely. 


Figure 2.5: Transition table for the DFA of Example 2.1 


2.2.4 Extending the Transition Function to Strings 


We have explained informally that the DFA defines a language: the set of all 
strings that result in a sequence of state transitions from the start state to an 
accepting state. In terms of the transition diagram, the language of a DFA 
is the set of labels along all the paths that lead from the start state to any 
accepting state. 

Now, we need to make the notion of the language of a DFA precise. To do 
so, we define an extended transition function that describes what happens when 
we start in any state and follow any sequence of inputs. If 6 is our transition 
function, then the extended transition function constructed from 6 will be called 
6. The extended transition function is a function that takes a state q and a 
string w and returns a state p — the state that the automaton reaches when 
starting in state q and processing the sequence of inputs w. We define ô by 
induction on the length of the input string, as follows: 


BASIS: 4(q, €) = q. That is, if we are in state q and read no inputs, then we 
are still in state q. 


INDUCTION: Suppose w is a string of the form xa; that is, a is the last symbol 
of w, and zx is the string consisting of all but the last symbol.” For example, 
w = 1101 is broken into x = 110 and a= 1. Then 


ôq, w) = (lq, £), a) (2.1) 


Now (2.1) may seem like a lot to take in, but the idea is simple. To compute 
0(q, w), first compute 6(q, x), the state that the automaton is in after processing 
all but the last symbol of w. Suppose this state is p; that is, 6(q,x) = p. Then 


(q, w) is what we get by making a transition from state p on input a, the last 
symbol of w. That is, 6(q,w) = 6(p, a). 
3Recall our convention that letters at the beginning of the alphabet are symbols, and those 


near the end of the alphabet are strings. We need that convention to make sense of the phrase 
“of the form xa.” 
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Example 2.4: Let us design a DFA to accept the language 
L = {w | w has both an even number of 0’s and an even number of 1’s} 


It should not be surprising that the job of the states of this DFA is to count 
both the number of 0’s and the number of 1’s, but count them modulo 2. That 
is, the state is used to remember whether the number of 0’s seen so far is even or 
odd, and also to remember whether the number of 1’s seen so far is even or odd. 
There are thus four states, which can be given the following interpretations: 


qo: Both the number of 0’s seen so far and the number of 1’s seen so far are 
even. 


qi: The number of 0’s seen so far is even, but the number of 1’s seen so far is 
odd. 


q2: The number of 1’s seen so far is even, but the number of 0’s seen so far is 
odd. 


q3: Both the number of 0’s seen so far and the number of 1’s seen so far are 
odd. 


State qo is both the start state and the lone accepting state. It is the start 
state, because before reading any inputs, the numbers of 0’s and 1’s seen so 
far are both zero, and zero is even. It is the only accepting state, because it 
describes exactly the condition for a sequence of 0’s and 1’s to be in language 
L. 


Figure 2.6: Transition diagram for the DFA of Example 2.4 


We now know almost how to specify the DFA for language L. It is 


A= ({q0,%; q2, 93}; {0, 1},ô, qo, {qo}) 


where the transition function 0 is described by the transition diagram of Fig. 2.6. 
Notice how each input 0 causes the state to cross the horizontal, dashed line. 
Thus, after seeing an even number of 0’s we are always above the line, in state 
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qo or qı while after seeing an odd number of 0’s we are always below the line, 
in state q2 or q3. Likewise, every 1 causes the state to cross the vertical, dashed 
line. Thus, after seeing an even number of 1’s, we are always to the left, in state 
do OY q2, while after seeing an odd number of 1’s we are to the right, in state qı 
or q3. These observations are an informal proof that the four states have the 
interpretations attributed to them. However, one could prove the correctness 
of our claims about the states formally, by a mutual induction in the spirit of 
Example 1.23. 

We can also represent this DFA by a transition table. Figure 2.7 shows this 
table. However, we are not just concerned with the design of this DFA; we 
want to use it to illustrate the construction of 6 from its transition function 6. 
Suppose the input is 110101. Since this string has even numbers of 0’s and 1’s 
both, we expect it is in the language. Thus, we expect that ô(qo, 110101) = qo, 
since qo is the only accepting state. Let us now verify that claim. 


Figure 2.7: Transition table for the DFA of Example 2.4 


The check involves computing 4(qo, w) for each prefix w of 110101, starting 
at € and going in increasing size. The summary of this calculation is: 


© 5(qo,€) = qo: 

© ôlqo, 1) = 6((qo,€), 1) = 6(qo, 1) = q1- 

© 5(qo, 11) = 6(3(qo,1),1) = d(m,1) = @. 

e (qo, 110) = 5(4(qo, 11),0) = (qo, 0) = a. 

© 5(qo, 1101) = 5(5(qo, 110), 1) = 6(q2, 1) = a. 

e (qo, 11010) = 5(5(qo, 1101), 0) = ô(q3, 0) = qı. 


e ô(qo, 110101) = 5(5(qo, 11010), 1) = (q1, 1) = qo. 
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Standard Notation and Local Variables 


After reading this section, you might imagine that our customary notation 
is required; that is, you must use 6 for the transition function, use A for 
the name of a DFA, and so on. We tend to use the same variables to 
denote the same thing across all examples, because it helps to remind you 
of the types of variables, much the way a variable 7 in a program is almost 
always of integer type. However, we are free to call the components of an 
automaton, or anything else, anything we wish. Thus, you are free to call 
a DFA M and its transition function T if you like. 

Moreover, you should not be surprised that the same variable means 
different things in different contexts. For example, the DFA’s of Examples 
2.1 and 2.4 both were given a transition function called 6. However, the 
two transition functions are each local variables, belonging only to their 
examples. These two transition functions are very different and bear no 
relationship to one another. 


2.2.5 The Language of a DFA 


Now, we can define the language of a DFA A = (Q, ©, ô, qo, F). This language 
is denoted L(A), and is defined by 


L(A) = {w | 6(qo,w) is in F} 


That is, the language of A is the set of strings w that take the start state qo to 
one of the accepting states. If L is L(A) for some DFA A, then we say L is a 
regular language. 


Example 2.5: As we mentioned earlier, if A is the DFA of Example 2.1, then 
L(A) is the set of all strings of 0’s and 1’s that contain a substring 01. If A is 
instead the DFA of Example 2.4, then L(A) is the set of all strings of 0’s and 
1’s whose numbers of 0’s and 1’s are both even. 


2.2.6 Exercises for Section 2.2 


Exercise 2.2.1: In Fig. 2.8 is a marble-rolling toy. A marble is dropped at 
A or B. Levers z1, £2, and x3 cause the marble to fall either to the left or to 
the right. Whenever a marble encounters a lever, it causes the lever to reverse 
after the marble passes, so the next marble will take the opposite branch. 


* a) Model this toy by a finite automaton. Let the inputs A and B represent 
the input into which the marble is dropped. Let acceptance correspond 
to the marble exiting at D; nonacceptance represents a marble exiting at 
C. 
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A B 


C D 


Figure 2.8: A marble-rolling toy 


! b) Informally describe the language of the automaton. 


c) Suppose that instead the levers switched before allowing the marble to 
pass. How would your answers to parts (a) and (b) change? 


*! Exercise 2.2.2: We defined ô by breaking the input string into any string 
followed by a single symbol (in the inductive part, Equation 2.1). However, we 
informally think of ô as describing what happens along a path with a certain 
string of labels, and if so, then it should not matter how we break the input 
string in the definition of 6. Show that in fact, ô(q, sy) = ô(ô(q, £), y) for any 
state q and strings x and y. Hint: Perform an induction on |y]. 


! Exercise 2.2.3: Show that for any state q, string x, and input symbol a, 


A 


0(q,ax) = 6(d(q,a), 2). Hint: Use Exercise 2.2.2. 


Exercise 2.2.4: Give DFA’s accepting the following languages over the alpha- 
bet {0, 1}: 


* a) The set of all strings ending in 00. 


b) The set of all strings with three consecutive 0’s (not necessarily at the 
end). 


c) The set of strings with 011 as a substring. 


! Exercise 2.2.5: Give DFA’s accepting the following languages over the alpha- 
bet {0,1}: 


a) The set of all strings such that each block of five consecutive symbols 
contains at least two 0’s. 
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b) The set of all strings whose tenth symbol from the right end is a 1. 
c) The set of strings that either begin or end (or both) with 01. 


d) The set of strings such that the number of 0’s is divisible by five, and the 
number of 1’s is divisible by 3. 


!! Exercise 2.2.6: Give DFA’s accepting the following languages over the alpha- 
bet {0, 1}: 


* a) The set of all strings beginning with a 1 that, when interpreted as a binary 
integer, is a multiple of 5. For example, strings 101, 1010, and 1111 are 
in the language; 0, 100, and 111 are not. 


b) The set of all strings that, when interpreted in reverse as a binary inte- 
ger, is divisible by 5. Examples of strings in the language are 0, 10011, 
1001100, and 0101. 


Exercise 2.2.7: Let A be a DFA and q a particular state of A, such that 
6(q,a) = q for all input symbols a. Show by induction on the length of the 
input that for all input strings w, d(q,w) = q. 


Exercise 2.2.8: Let A be a DFA and a a particular input symbol of A, such 
that for all states q of A we have 6(q,a) = q. 


a) Show by induction on n that for all n > 0, 4(q,a”) = q, where a” is the 
string consisting of n a’s. 


b) Show that either {a}* C L(A) or {a} N L(A) = 9. 


* 


Exercise 2.2.9: Let A = (Q, £, ô, qo, {qs }) be a DFA, and suppose that for all 
ain we have ô(qo,a) = 6(qy, a). 


a) Show that for all w 4 € we have ô(qo, w) = (qp, w). 


b) Show that if z is a nonempty string in L(A), then for all k > 0, x* (i.e., 
x written k times) is also in L(A). 


* 


Exercise 2.2.10: Consider the DFA with the following transition table: 
0] 1 
>All AJB 
BI BIA 


Informally describe the language accepted by this DFA, and prove by induction 
on the length of an input string that your description is correct. Hint: When 
setting up the inductive hypothesis, it is wise to make a statement about what 
inputs get you to each state, not just what inputs get you to the accepting 
state. 
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Exercise 2.2.11: Repeat Exercise 2.2.10 for the following transition table: 


0 |1 


>A | BIA 
xB CI|A 
G -C-C 


2.3 Nondeterministic Finite Automata 


A “nondeterministic” finite automaton (NFA) has the power to be in several 
states at once. This ability is often expressed as an ability to “guess” something 
about its input. For instance, when the automaton is used to search for certain 
sequences of characters (e.g., keywords) in a long text string, it is helpful to 
“guess” that we are at the beginning of one of those strings and use a sequence of 
states to do nothing but check that the string appears, character by character. 
We shall see an example of this type of application in Section 2.4. 

Before examining applications, we need to define nondeterministic finite 
automata and show that each one accepts a language that is also accepted by 
some DFA. That is, the NFA’s accept exactly the regular languages, just as 
DFA’s do. However, there are reasons to think about NFA’s. They are often 
more succinct and easier to design than DFA’s. Moreover, while we can always 
convert an NFA to a DFA, the latter may have exponentially more states than 
the NFA; fortunately, cases of this type are rare. 


2.3.1 An Informal View of Nondeterministic Finite 
Automata 


Like the DFA, an NFA has a finite set of states, a finite set of input symbols, 
one start state and a set of accepting states. It also has a transition function, 
which we shall commonly call 6. The difference between the DFA and the NFA 
is in the type of 6. For the NFA, ô is a function that takes a state and input 
symbol as arguments (like the DFA’s transition function), but returns a set 
of zero, one, or more states (rather than returning exactly one state, as the 
DFA must). We shall start with an example of an NFA, and then make the 
definitions precise. 


Example 2.6: Figure 2.9 shows a nondeterministic finite automaton, whose 
job is to accept all and only the strings of 0’s and 1’s that end in 01. State 
qo is the start state, and we can think of the automaton as being in state qo 
(perhaps among other states) whenever it has not yet “guessed” that the final 
01 has begun. It is always possible that the next symbol does not begin the 
final 01, even if that symbol is 0. Thus, state gg may transition to itself on both 
0 and 1. 

However, if the next symbol is 0, this NFA also guesses that the final 01 has 
begun. An arc labeled 0 thus leads from qo to state qı. Notice that there are 
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Figure 2.9: An NFA accepting all strings that end in 01 


two arcs labeled 0 out of qo. The NFA has the option of going either to qo or 
to qı, and in fact it does both, as we shall see when we make the definitions 
precise. In state q,, the NFA checks that the next symbol is 1, and if so, it goes 
to state q2 and accepts. 


Notice that there is no arc out of qı labeled 0, and there are no arcs at all 
out of q2. In these situations, the thread of the NFA’s existence corresponding 
to those states simply “dies,” although other threads may continue to exist. 
While a DFA has exactly one arc out of each state for each input symbol, an 
NFA has no such constraint; we have seen in Fig. 2.9 cases where the number 
of arcs is zero, one, and two, for example. 


h = 1 s h = 1 = 1 = 1 
ies 4 4 N a, 
(stuck) ae Shi 
h h 
(stuck) 
0 0 1 0 1 


Figure 2.10: The states an NFA is in during the processing of input sequence 
00101 


Figure 2.10 suggests how an NFA processes inputs. We have shown what 
happens when the automaton of Fig. 2.9 receives the input sequence 00101. It 
starts in only its start state, go. When the first 0 is read, the NFA may go to 
either state go or state qı, so it does both. These two threads are suggested by 
the second column in Fig. 2.10. 


Then, the second 0 is read. State q may again go to both qo and qı. 
However, state qı has no transition on 0, so it “dies.” When the third input, a 
1, occurs, we must consider transitions from both qo and qı. We find that qo 
goes only to qo on 1, while qı goes only to q2. Thus, after reading 001, the NFA 
is in states gg and q2. Since q2 is an accepting state, the NFA accepts 001. 


However, the input is not finished. The fourth input, a 0, causes qo’s thread 
to die, while go goes to both go and qı. The last input, a 1, sends qo to go and 
qı to q2. Since we are again in an accepting state, 00101 is accepted. 
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2.3.2 Definition of Nondeterministic Finite Automata 


Now, let us introduce the formal notions associated with nondeterministic finite 
automata. The differences between DFA’s and NFA’s will be pointed out as we 
do. An NFA is represented essentially like a DFA: 


A= (Q, £, ô, qo, F) 
where: 

1. Q is a finite set of states. 

2. © is a finite set of input symbols. 

3. qo, a member of Q, is the start state. 

4. F, a subset of Q, is the set of final (or accepting) states. 

5. ô, the transition function is a function that takes a state in Q and an 
input symbol in © as arguments and returns a subset of Q. Notice that 
the only difference between an NFA and a DFA is in the type of value 
that ô returns: a set of states in the case of an NFA and a single state in 


the case of a DFA. 


Example 2.7: The NFA of Fig. 2.9 can be specified formally as 


({q0, 41,92}, {0, 1}, ô, qo, {02 }) 


where the transition function ô is given by the transition table of Fig. 2.11. 


Figure 2.11: Transition table for an NFA that accepts all strings ending in 01 


Notice that transition tables can be used to specify the transition function 
for an NFA as well as for a DFA. The only difference is that each entry in the 
table for the NFA is a set, even if the set is a singleton (has one member). Also 
notice that when there is no transition at all from a given state on a given input 
symbol, the proper entry is Ý, the empty set. 
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2.3.3 The Extended Transition Function 


As for DFA’s, we need to extend the transition function 6 of an NFA to a 
function 6 that takes a state q and a string of input symbols w, and returns the 
set of states that the NFA is in if it starts in state q and processes the string w. 
The idea was suggested by Fig. 2.10; in essence 4(q, w) is the column of states 
found after reading w, if q is the lone state in the first column. For instance, 
Fig. 2.10 suggests that 5(qo, 001) = {qo, q2}. Formally, we define ô for an NFA’s 
transition function 6 by: 


BASIS: d(q, €) = {q}. That is, without reading any input symbols, we are only 
in the state we began in. 


INDUCTION: Suppose w is of the form w = xa, where a is the final symbol of 
w and zg is the rest of w. Also suppose that ô(q, x) = {pi,p2,..., px}. Let 


k 


U Ô(pi a) = {r1,12,..-,%m} 


i=1 


Then 4(q,w) = {ri,T2,...,T%m}. Less formally, we compute 4(q,w) by first 
computing ô(q, x), and by then following any transition from any of these states 
that is labeled a. 


Example 2.8: Let us use ô to describe the processing of input 00101 by the 
NFA of Fig. 2.9. A summary of the steps is: 


A 


1. 0(qo,€) = {a0}. 


A 


2. ô(qo,0) = 5(qo,0) = {q0; q1 }- 


A 


3. (qo, 00) = ô(q0,0) U (q1, 0) = {q0;q1} UD = {q0, um}. 


A 


4. (qo, 001) = ô(qo, 1) U ô(q1, 1) = {qo} U {a2} = {40,92}. 


A 


5. ô(qo, 0010) = ô(qo, 0) U ô(q2,0) = {40,1} UO = {q0, q1 }- 


A 


6. 6(qo, 00101) = ô(qo, 1) U óla, 1) = {qo} U {q2} = {q0, q2}. 


Line (1) is the basis rule. We obtain line (2) by applying ô to the lone state, qo, 
that is in the previous set, and get {qo, q1} as a result. Line (3) is obtained by 
taking the union over the two states in the previous set of what we get when we 
apply 6 to them with input 0. That is, (qo, 0) = {q0, q1}, while 6(q,0) = 0. 
For line (4), we take the union of ô(qo, 1) = {qo} and ô(q1, 1) = {q2}. Lines (5) 
and (6) are similar to lines (3) and (4). 
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2.3.4 The Language of an NFA 


As we have suggested, an NFA accepts a string w if it is possible to make any 
sequence of choices of next state, while reading the characters of w, and go from 
the start state to any accepting state. The fact that other choices using the 
input symbols of w lead to a nonaccepting state, or do not lead to any state at 
all (i.e., the sequence of states “dies” ), does not prevent w from being accepted 
by the NFA as a whole. Formally, if A = (Q, £, ô, qo, F) is an NFA, then 


L(A) = {w | 5(qo,w) N F #9} 


That is, L(A) is the set of strings w in ©* such that 5(qo,w) contains at least 
one accepting state. 


Example 2.9: As an example, let us prove formally that the NFA of Fig. 2.9 
accepts the language L = {w | w ends in 01}. The proof is a mutual induction 
of the following three statements that characterize the three states: 


1. 4(qo, w) contains qo for every w. 
2. 6(qo,w) contains qı if and only if w ends in 0. 
3. 6(qo,w) contains q if and only if w ends in 01. 


To prove these statements, we need to consider how A can reach each state; i.e., 
what was the last input symbol, and in what state was A just before reading 
that symbol? A 

Since the language of this automaton is the set of strings w such that ô(qo, w) 
contains q2 (because q2 is the only accepting state), the proof of these three 
statements, in particular the proof of (3), guarantees that the language of this 
NFA is the set of strings ending in 01. The proof of the theorem is an induction 
on |w|, the length of w, starting with length 0. 


BASIS: If |w| = 0, then w = e. Statement (1) says that ô(qo,€) contains qo, 
which it does by the basis part of the definition of ô. For statement (2), we 
know that € does not end in 0, and we also know that, (qo, €) does not contain 
qı, again by the basis part of the definition of ô. Thus, the hypotheses of both 
directions of the if-and-only-if statement are false, and therefore both directions 
of the statement are true. The proof of statement (3) for w = e is essentially 
the same as the above proof for statement (2). 


INDUCTION: Assume that w = xa, where a is a symbol, either 0 or 1. We 
may assume statements (1) through (3) hold for x, and we need to prove them 
for w. That is, we assume |w| = n + 1, so |z| = n. We assume the inductive 
hypothesis for n and prove it for n + 1. 


1. We know that 6(qo,x) contains qo. Since there are transitions on both 
0 and 1 from qo to itself, it follows that ô(qo,w) also contains qo, so 
statement (1) is proved for w. 


60 CHAPTER 2. FINITE AUTOMATA 


2. (If) Assume that w ends in 0; i.e., a= 0. By statement (1) applied to z, 
we know that (qo, x) contains qo. Since there is a transition from go to 
qi on input 0, we conclude that 6(¢o,w) contains qı. 


(Only-if) Suppose 5(qo,w) contains qi. If we look at the diagram of 
Fig. 2.9, we see that the only way to get into state qı is if the input 
sequence w is of the form 20. That is enough to prove the “only-if” 
portion of statement (2). 


3. (If) Assume that w ends in 01. Then if w = xa, we know that a = 1 and 
x ends in 0. By statement (2) applied to x, we know that ô (qo, £) contains 
qı. Since there is a transition from qı to q2 on input 1, we conclude that 
0(qo,w) contains qo. 

(Only-if) Suppose ô (qo, w) contains q2. Looking at the diagram of Fig. 2.9, 
we discover that the only way to get to state q2 is for w to be of the form 
xl, where 4(qo, £) contains qı. By statement (2) applied to z, we know 
that x ends in 0. Thus, w ends in 01, and we have proved statement (3). 


2.3.5 Equivalence of Deterministic and Nondeterministic 
Finite Automata 


Although there are many languages for which an NFA is easier to construct 
than a DFA, such as the language (Example 2.6) of strings that end in 01, it is 
a surprising fact that every language that can be described by some NFA can 
also be described by some DFA. Moreover, the DFA in practice has about as 
many states as the NFA, although it often has more transitions. In the worst 
case, however, the smallest DFA can have 2” states while the smallest NFA for 
the same language has only n states. 

The proof that DFA’s can do whatever NFA’s can do involves an important 
“construction” called the subset construction because it involves constructing all 
subsets of the set of states of the NFA. In general, many proofs about automata 
involve constructing one automaton from another. It is important for us to 
observe the subset construction as an example of how one formally describes one 
automaton in terms of the states and transitions of another, without knowing 
the specifics of the latter automaton. 

The subset construction starts from an NFA N = (Qn, X, ôn,qo, Fy). Its 
goal is the description of a DFA D = (Qp, Ł, ôn, {qo}, Fp) such that L(D) = 
L(N). Notice that the input alphabets of the two automata are the same, and 
the start state of D is the set containing only the start state of N. The other 
components of D are constructed as follows. 


e Qp is the set of subsets of Qn; i.e., Qp is the power set of Qn. Note 
that if Qn has n states, then Qp will have 2” states. Often, not all these 
states are accessible from the start state of Qp. Inaccessible states can 
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be “thrown away,” so effectively, the number of states of D may be much 
smaller than 2”. 


e Fp is the set of subsets S of Qn such that S N Fy #@. That is, Fp is 
all sets of N’s states that include at least one accepting state of N. 


e For each set S C Qu and for each input symbol a in È, 


dp(S,a) = U On (p, a) 


pins 


That is, to compute dp(.S,a) we look at all the states p in S, see what 
states N goes to from p on input a, and take the union of all those states. 


0 || 0 0 
— {qo} || {0,41} | {ao} 
{na} 0 {a2} 
*{qo} |] 0 0 


{qo,u} {qo, m1} {q0, 42} 
*{q0, 92} || {go,a} | {G0} 
*{m1,q2} | 0 {a2} 

*{q0, q1, G2} {qo q1} {40,42} 


Figure 2.12: The complete subset construction from Fig. 2.9 


Example 2.10: Let N be the automaton of Fig. 2.9 that accepts all strings 
that end in 01. Since N’s set of states is {q0, q1, q2}, the subset construction 
produces a DFA with 2° = 8 states, corresponding to all the subsets of these 
three states. Figure 2.12 shows the transition table for these eight states; we 
shall show shortly the details of how some of these entries are computed. 

Notice that this transition table belongs to a deterministic finite automaton. 
Even though the entries in the table are sets, the states of the constructed DFA 
are sets. To make the point clearer, we can invent new names for these states, 
e.g., A for 0, B for {qo}, and so on. The DFA transition table of Fig 2.13 defines 
exactly the same automaton as Fig. 2.12, but makes clear the point that the 
entries in the table are single states of the DFA. 

Of the eight states in Fig. 2.13, starting in the start state B, we can only 
reach states B, E, and F. The other five states are inaccessible from the start 
state and may as well not be there. We often can avoid the exponential-time step 
of constructing transition-table entries for every subset of states if we perform 
“lazy evaluation” on the subsets, as follows. 


BASIS: We know for certain that the singleton set consisting only of N’s start 
state is accessible. 
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Figure 2.13: Renaming the states of Fig. 2.12 


INDUCTION: Suppose we have determined that set S of states is accessible. 
Then for each input symbol a, compute the set of states dp(S, a); we know that 
these sets of states will also be accessible. 


For the example at hand, we know that {qo} is a state of the DFA D. We 
find that dp({qo},0) = {q0, q1} and dp({qo},1) = {qo}. Both these facts are 
established by looking at the transition diagram of Fig. 2.9 and observing that 
on 0 there are arcs out of qo to both qo and q1, while on 1 there is an arc only 
to qo. We thus have one row of the transition table for the DFA: the second 
row in Fig. 2.12. 

One of the two sets we computed is “old”; {qo } has already been considered. 
However, the other — {go, qı $} — is new and its transitions must be computed. 
We find dp({qo, a },9) = {q0,q1} and dp({q0, 1}, 1) = {q0,q2}. For instance, 
to see the latter calculation, we know that 


dp ({qo, a}, 1) = ôn (q0, 1) U ôn (q1, 1) = {ao} U {a2} = {90,02} 


We now have the fifth row of Fig. 2.12, and we have discovered one new 
state of D, which is {qo, q2}. A similar calculation tells us 


dp({qo, 92}, 0) = dn (qo, 0) U dn (qo, 0) = {0,1} U O = {a0, q1} 
dp ({qo, 42}, 1) = dn (q0, 1) U ôn (q2, 1) = {q0} UO = {q0} 


These calculations give us the sixth row of Fig. 2.12, but it gives us only sets 
of states that we have already seen. 

Thus, the subset construction has converged; we know all the accessible 
states and their transitions. The entire DFA is shown in Fig. 2.14. Notice that 
it has only three states, which is, by coincidence, exactly the same number of 
states as the NFA of Fig. 2.9, from which it was constructed. However, the DFA 
of Fig. 2.14 has six transitions, compared with the four transitions in Fig. 2.9. 


We need to show formally that the subset construction works, although 
the intuition was suggested by the examples. After reading sequence of input 
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Figure 2.14: The DFA constructed from the NFA of Fig 2.9 


symbols w, the constructed DFA is in one state that is the set of NFA states 
that the NFA would be in after reading w. Since the accepting states of the 
DFA are those sets that include at least one accepting state of the NFA, and the 
NFA also accepts if it gets into at least one of its accepting states, we may then 
conclude that the DFA and NFA accept exactly the same strings, and therefore 
accept the same language. 


Theorem 2.11: If D = (Qp, ÈX, ôn, {q0}, Fp) is the DFA constructed from 
NFA N = (Qn, X, ôn, 40, Fn) by the subset construction, then L(D) = L(N). 


PROOF: What we actually prove first, by induction on |w], is that 


dp({qo},w) = Ôn (qo, w) 


Notice that each of the ô functions returns a set of states from Qn, but dp 
interprets this set as one of the states of Qp (which is the power set of Qn), 
while ôy interprets this set as a subset of Qy. 


BASIS: Let |w| = 0; that is, w = e. By the basis definitions of ô for DFA’s and 
NFA’s, both dp({qo},€) and dn (qo, €) are {qo}. 


INDUCTION: Let w be of length n + 1, and assume the statement for length 
n. Break w up as w = xa, where a is the final symbol of w. By the induc- 
tive hypothesis, oda): z) = ôn (qo, £). Let both these sets of N’s states be 


{p1,p2,--+,Dr}- Ji 
The inductive part of the definition of ô for NFA’s tells us 


N (qo, W -U ôn( (pi, a (2.2) 
The subset construction, on the other eta tells us that 
Op ({p1, P2,---, Pe}, @) = L Sy (i, a) (2.3) 


Now, let us use (2.3) and the fact that dp({qo}, x) = {p1, p2,..-, pp} in the 
inductive part of the definition of 6 for DFA’s: 
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k 
dp({qo},w) = ôn (ôn ({q0}, 2), a) = ôn ({p1, p2, -- -, Pk}, a) = U Ôn (pi, a) 


i=l 

(2.4) 
Thus, Equations (2.2) and (2.4) demonstrate that dp({qo},w) = dn(qo,w). 
When we observe that D and N both accept w if and only if dp({qo},w) or 


ô N(qo, w), respectively, contain a state in Fy, we have a complete proof that 
L(D) = L(N). 


Theorem 2.12: A language L is accepted by some DFA if and only if L is 
accepted by some NFA. 


PROOF: (If) The “if” part is the subset construction and Theorem 2.11. 


(Only-if) This part is easy; we have only to convert a DFA into an identical NFA. 
Put intuitively, if we have the transition diagram for a DFA, we can also inter- 
pret it as the transition diagram of an NFA, which happens to have exactly one 
choice of transition in any situation. More formally, let D = (Q, ©, ôD, qo, F) 
be a DFA. Define N = (Q, £, ôn ,qo, F) to be the equivalent NFA, where dy is 
defined by the rule: 


e If ôn(q,a) = p, then ôn (q,a) = {p}. 


It is then easy to show by induction on |w], that if ôn (qo, w) = p then 


A 


ôn (qo, w) = {p} 


We leave the proof to the reader. As a consequence, w is accepted by D if and 
only if it is accepted by N; i.e., L(D) = L(N). 


2.3.6 A Bad Case for the Subset Construction 


In Example 2.10 we found that the DFA had no more states than the NFA. 
As we mentioned, it is quite common in practice for the DFA to have roughly 
the same number of states as the NFA from which it is constructed. However, 
exponential growth in the number of states is possible; all the 2” DFA states 
that we could construct from an n-state NFA may turn out to be accessible. The 
following example does not quite reach that bound, but it is an understandable 
way to reach 2” states in the smallest DFA that is equivalent to an n + 1-state 
NFA. 


Example 2.13: Consider the NFA N of Fig. 2.15. L(N) is the set of all strings 
of 0’s and 1’s such that the nth symbol from the end is 1. Intuitively, a DFA 
D that accepts this language must remember the last n symbols it has read. 
Since any of 2” subsets of the last n symbols could have been 1, if D has fewer 
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than 2” states, then there would be some state q such that D can be in state q 
after reading two different sequences of n bits, say aiaz -an and b)b2---by. 

Since the sequences are different, they must differ in some position, say 
a; # bi. Suppose (by symmetry) that a; = 1 and b; = 0. Ifi = 1, then q 
must be both an accepting state and a nonaccepting state, since aja2---G@p is 
accepted (the nth symbol from the end is 1) and 61bo---b, is not. If i > 1, 
then consider the state p that D enters after reading i — 1 0’s. Then p must 
be both accepting and nonaccepting, since a;a;41-++@,00--+-0 is accepted and 
bjbj41--+-+b,00---0 is not. 


0, 1 


Ca: a 0, 1 0, 1 0, 1 0, 1 
o-oo OO 


Figure 2.15: This NFA has no equivalent DFA with fewer than 2” states 


Now, let us see how the NFA N of Fig. 2.15 works. There is a state qo that 
the NFA is always in, regardless of what inputs have been read. If the next 
input is 1, N may also “guess” that this 1 will be the nth symbol from the end, 
so it goes to state qı as well as go. From state qı, any input takes N to q, 
the next input takes it to q3, and so on, until n — 1 inputs later, it is in the 
accepting state qn. The formal statement of what the states of N do is: 


1. N is in state qo after reading any sequence of inputs w. 


2. N isin state qi, for i = 1,2,...,n, after reading input sequence w if and 
only if the ith symbol from the end of w is 1; that is, w is of the form 
1a ,a2--+a;-1, where the a,;’s are each input symbols. 


We shall not prove these statements formally; the proof is an easy induction 
on |w|, mimicking Example 2.9. To complete the proof that the automaton 
accepts exactly those strings with a 1 in the nth position from the end, we 
consider statement (2) with i = n. That says N is in state qn if and only if 
the nth symbol from the end is 1. But qn is the only accepting state, so that 
condition also characterizes exactly the set of strings accepted by N. 


2.3.7 Exercises for Section 2.3 
* Exercise 2.3.1: Convert to a DFA the following NFA: 
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The Pigeonhole Principle 


In Example 2.13 we used an important reasoning technique called the 
pigeonhole principle. Colloquially, if you have more pigeons than pigeon- 
holes, and each pigeon flies into some pigeonhole, then there must be at 
least one hole that has more than one pigeon. In our example, the “pi- 
geons” are the sequences of n bits, and the “pigeonholes” are the states. 
Since there are fewer states than sequences, one state must be assigned 
two sequences. 

The pigeonhole principle may appear obvious, but it actually depends 
on the number of pigeonholes being finite. Thus, it works for finite-state 
automata, with the states as pigeonholes, but does not apply to other 
kinds of automata that have an infinite number of states. 

To see why the finiteness of the number of pigeonholes is essential, 
consider the infinite situation where the pigeonholes correspond to integers 
1,2,.... Number the pigeons 0,1,2,..., so there is one more pigeon than 
there are pigeonholes. However, we can send pigeon i to hole i+ 1 for all 
i > 0. Then each of the infinite number of pigeons gets a pigeonhole, and 
no two pigeons have to share a pigeonhole. 


Exercise 2.3.2: Convert to a DFA the following NFA: 


! Exercise 2.3.3: Convert the following NFA to a DFA and informally describe 
the language it accepts. 


! Exercise 2.3.4: Give nondeterministic finite automata to accept the following 
languages. Try to take advantage of nondeterminism as much as possible. 
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Dead States and DFA’s Missing Some Transitions 


We have formally defined a DFA to have a transition from any state, 
on any input symbol, to exactly one state. However, sometimes, it is 
more convenient to design the DFA to “die” in situations where we know 
it is impossible for any extension of the input sequence to be accepted. 
For instance, observe the automaton of Fig. 1.2, which did its job by 
recognizing a single keyword, then, and nothing else. Technically, this 
automaton is not a DFA, because it lacks transitions on most symbols 
from each of its states. 

However, such an automaton is an NFA. If we use the subset construc- 
tion to convert it to a DFA, the automaton looks almost the same, but it 
includes a dead state, that is, a nonaccepting state that goes to itself on 
every possible input symbol. The dead state corresponds to , the empty 
set of states of the automaton of Fig. 1.2. 

In general, we can add a dead state to any automaton that has no 
more than one transition for any state and input symbol. Then, add a 
transition to the dead state from each other state q, on all input symbols 
for which q has no other transition. The result will be a DFA in the strict 
sense. Thus, we shall sometimes refer to an automaton as a DFA if it has 
at most one transition out of any state on any symbol, rather than if it 
has exactly one transition. 


* a) The set of strings over alphabet {0,1,...,9} such that the final digit has 
appeared before. 


b) The set of strings over alphabet {0,1,...,9} such that the final digit has 
not appeared before. 


c) The set of strings of 0’s and 1’s such that there are two 0’s separated by 
a number of positions that is a multiple of 4. Note that 0 is an allowable 
multiple of 4. 


Exercise 2.3.5: In the only-if portion of Theorem 2.12 we omitted the proof 
by induction on |w| that if dp(qo,w) = p then dn(qgo,w) = {p}. Supply this 
proof. 


Exercise 2.3.6: In the box on “Dead States and DFA’s Missing Some Tran- 
sitions,” we claim that if N is an NFA that has at most one choice of state for 
any state and input symbol (i.e., 6(q, a) never has size greater than 1), then the 
DFA D constructed from N by the subset construction has exactly the states 
and transitions of N plus transitions to a new dead state whenever N is missing 
a transition for a given state and input symbol. Prove this contention. 
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Exercise 2.3.7: In Example 2.13 we claimed that the NFA N is in state qi, 
for i =1,2,...,n, after reading input sequence w if and only if the ith symbol 
from the end of w is 1. Prove this claim. 


2.4 An Application: Text Search 


In this section, we shall see that the abstract study of the previous section, 
where we considered the “problem” of deciding whether a sequence of bits ends 
in 01, is actually an excellent model for several real problems that appear in 
applications such as Web search and extraction of information from text. 


2.4.1 Finding Strings in Text 


A common problem in the age of the Web and other on-line text repositories 
is the following. Given a set of words, find all documents that contain one 
(or all) of those words. A search engine is a popular example of this process. 
The search engine uses a particular technology, called inverted indexes, where 
for each word appearing on the Web (there are 100,000,000 different words), 
a list of all the places where that word occurs is stored. Machines with very 
large amounts of main memory keep the most common of these lists available, 
allowing many people to search for documents at once. 

Inverted-index techniques do not make use of finite automata, but they also 
take very large amounts of time for crawlers to copy the Web and set up the 
indexes. There are a number of related applications that are unsuited for in- 
verted indexes, but are good applications for automaton-based techniques. The 
characteristics that make an application suitable for searches that use automata 
are: 


1. The repository on which the search is conducted is rapidly changing. For 
example: 


(a) Every day, news analysts want to search the day’s on-line news arti- 
cles for relevant topics. For example, a financial analyst might search 
for certain stock ticker symbols or names of companies. 


(b) A “shopping robot” wants to search for the current prices charged 
for the items that its clients request. The robot will retrieve current 
catalog pages from the Web and then search those pages for words 
that suggest a price for a particular item. 


2. The documents to be searched cannot be cataloged. For example, Ama- 
zon.com does not make it easy for crawlers to find all the pages for all the 
books that the company sells. Rather, these pages are generated “on the 
fly” in response to queries. However, we could send a query for books on 
a certain topic, say “finite automata,” and then search the pages retrieved 
for certain words, e.g., “excellent” in a review portion. 
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2.4.2 Nondeterministic Finite Automata for Text Search 


Suppose we are given a set of words, which we shall call the keywords, and we 
want to find occurrences of any of these words. In applications such as these, a 
useful way to proceed is to design a nondeterministic finite automaton, which 
signals, by entering an accepting state, that it has seen one of the keywords. 
The text of a document is fed, one character at a time to this NFA, which then 
recognizes occurrences of the keywords in this text. There is a simple form to 
an NFA that recognizes a set of keywords. 


1. There is a start state with a transition to itself on every input symbol, 
e.g. every printable ASCII character if we are examining text. Intuitively, 
the start state represents a “guess” that we have not yet begun to see one 
of the keywords, even if we have seen some letters of one of these words. 


2. For each keyword aiaz- -ak, there are k states, say qi, q2,...,qg. There 
is a transition from the start state to qı on symbol ay, a transition from 
qi to q2 on symbol az, and so on. The state qg is an accepting state and 
indicates that the keyword a az --apg has been found. 


Example 2.14: Suppose we want to design an NFA to recognize occurrences 
of the words web and ebay. The transition diagram for the NFA designed using 
the rules above is in Fig. 2.16. State 1 is the start state, and we use © to stand 


for the set of all printable ASCII characters. States 2 through 4 have the job 
of recognizing web, while states 5 through 8 recognize ebay. 


-OE 
OOO 


Figure 2.16: An NFA that searches for the words web and ebay 


Start 


Of course the NFA is not a program. We have two major choices for an 
implementation of this NFA. 


1. Write a program that simulates this NFA by computing the set of states 
it is in after reading each input symbol. The simulation was suggested in 
Fig. 2.10. 


2. Convert the NFA to an equivalent DFA using the subset construction. 
Then simulate the DFA directly. 
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Some text-processing programs, such as advanced forms of the UNIX grep 
command (egrep and fgrep) actually use a mixture of these two approaches. 
However, for our purposes, conversion to a DFA is easy and is guaranteed not 
to increase the number of states. 


2.4.3 A DFA to Recognize a Set of Keywords 


We can apply the subset construction to any NFA. However, when we apply that 
construction to an NFA that was designed from a set of keywords, according to 
the strategy of Section 2.4.2, we find that the number of states of the DFA is 
never greater than the number of states of the NFA. Since in the worst case the 
number of states exponentiates as we go to the DFA, this observation is good 
news and explains why the method of designing an NFA for keywords and then 
constructing a DFA from it is used frequently. The rules for constructing the 
set of DFA states is as follows. 


a) If qo is the start state of the NFA, then {qo} is one of the states of the 
DFA. 


b) Suppose p is one of the NFA states, and it is reached from the start state 
along a path whose symbols are a,a2-+--@m. Then one of the DFA states 
is the set of NFA states consisting of: 


1. do- 
2. p. 
3. Every other state of the NFA that is reachable from qo by following 


a path whose labels are a suffix of aja2---am, that is, any sequence 
of symbols of the form ajaj+1 +++ am. 


Note that in general, there will be one DFA state for each NFA state p. However, 
in step (b), two states may actually yield the same set of NFA states, and thus 
become one state of the DFA. For example, if two of the keywords begin with 
the same letter, say a, then the two NFA states that are reached from qo by an 
arc labeled a will yield the same set of NFA states and thus get merged in the 
DFA. 


Example 2.15: The construction of a DFA from the NFA of Fig. 2.16 is shown 
in Fig. 2.17. Each of the states of the DFA is located in the same position as 
the state p from which it is derived using rule (b) above. For example, consider 
the state 135, which is our shorthand for {1,3,5}. This state was constructed 
from state 3. It includes the start state, 1, because every set of the DFA states 
does. It also includes state 5 because that state is reached from state 1 by a 
suffix, e, of the string we that reaches state 3 in Fig. 2.16. 

The transitions for each of the DFA states may be calculated according to 
the subset construction. However, the rule is simple. From any set of states that 
includes the start state qo and some other states {p1, p2,..., Pn}, determine, for 
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Figure 2.17: Conversion of the NFA from Fig. 2.16 to a DFA 


each symbol x, where the p;’s go in the NFA, and let this DFA state have a 
transition labeled x to the DFA state consisting of gg and all the targets of the 
pis and go on symbol x. On all symbols x such that there are no transitions 
out of any of the p;’s on symbol z, let this DFA state have a transition on x to 
that state of the DFA consisting of go and all states that are reached from qo 
in the NFA following an arc labeled z. 

For instance, consider state 135 of Fig. 2.17. The NFA of Fig. 2.16 has 
transitions on symbol b from states 3 and 5 to states 4 and 6, respectively. 
Therefore, on symbol b, 135 goes to 146. On symbol e, there are no transitions 
of the NFA out of 3 or 5, but there is a transition from 1 to 5. Thus, in the 
DFA, 135 goes to 15 on input e. Similarly, on input w, 135 goes to 12. 

On every other symbol x, there are no transitions out of 3 or 5, and state 1 
goes only to itself. Thus, there are transitions from 135 to 1 on every symbol 
in © other than b, e, and w. We use the notation © — b — e — w to represent 
this set, and use similar representations of other sets in which a few symbols 
are removed from X. 


2.4.4 Exercises for Section 2.4 


Exercise 2.4.1: Design NFA’s to recognize the following sets of strings. 
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* a) abc, abd, and aacd. Assume the alphabet is {a,b,c,d}. 
b) 0104, 104, and 011. 
c) ab, bc, and ca. Assume the alphabet is {a,b, c}. 


Exercise 2.4.2: Convert each of your NFA’s from Exercise 2.4.1 to DFA’s. 


2.5 Finite Automata With Epsilon-Transitions 


We shall now introduce another extension of the finite automaton. The new 
“feature” is that we allow a transition on e, the empty string. In effect, an 
NFA is allowed to make a transition spontaneously, without receiving an input 
symbol. Like the nondeterminism added in Section 2.3, this new capability does 
not expand the class of languages that can be accepted by finite automata, but it 
does give us some added “programming convenience.” We shall also see, when 
we take up regular expressions in Section 3.1, how NFA’s with e-transitions, 
which we call «-NFA’s, are closely related to regular expressions and useful 
in proving the equivalence between the classes of languages accepted by finite 
automata and by regular expressions. 


2.5.1 Uses of e-Transitions 


We shall begin with an informal treatment of e-NFA’s, using transition diagrams 
with € allowed as a label. In the examples to follow, think of the automaton 
as accepting those sequences of labels along paths from the start state to an 
accepting state. However, each e along a path is “invisible”; i.e., it contributes 
nothing to the string along the path. 


Example 2.16: In Fig. 2.18 is an e NFA that accepts decimal numbers con- 
sisting of: 


1. An optional + or — sign, 
2. A string of digits, 
3. A decimal point, and 


4. Another string of digits. Either this string of digits, or the string (2) can 
be empty, but at least one of the two strings of digits must be nonempty. 


Of particular interest is the transition from qo to qı on any of €, +, or —. 
Thus, state qı represents the situation in which we have seen the sign if there 
is one, and perhaps some digits, but not the decimal point. State q2 represents 
the situation where we have just seen the decimal point, and may or may not 
have seen prior digits. In q, we have definitely seen at least one digit, but 
not the decimal point. Thus, the interpretation of q3 is that we have seen a 
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0,1,...,9 0,1,...,9 


0,1,...,9 Va 


Figure 2.18: An e-NFA accepting decimal numbers 


decimal point and at least one digit, either before or after the decimal point. 
We may stay in q3 reading whatever digits there are, and also have the option 
of “guessing” the string of digits is complete and going spontaneously to qs, the 
accepting state. 


Example 2.17: The strategy we outlined in Example 2.14 for building an 
NFA that recognizes a set of keywords can be simplified further if we allow 
e-transitions. For instance, the NFA recognizing the keywords web and ebay, 
which we saw in Fig. 2.16, can also be implemented with ¢-transitions as in 
Fig. 2.19. In general, we construct a complete sequence of states for each 
keyword, as if it were the only word the automaton needed to recognize. Then, 
we add a new start state (state 9 in Fig. 2.19), with e-transitions to the start- 
states of the automata for each of the keywords. 


Onn O) 
PO 6 Oo 


Figure 2.19: Using e-transitions to help recognize keywords 


2.5.2 The Formal Notation for an --NFA 


We may represent an e-NFA exactly as we do an NFA, with one exception: the 
transition function must include information about transitions on e. Formally, 
we represent an e-NFA A by A = (Q,%,6,q0, F), where all components have 
their same interpretation as for an NFA, except that ô is now a function that 
takes as arguments: 


1. A state in Q, and 
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2. A member of X U {e}, that is, either an input symbol, or the symbol e. 
We require that <€, the symbol for the empty string, cannot be a member 
of the alphabet ©, so no confusion results. 


Example 2.18: The -NFA of Fig. 2.18 is represented formally as 


E = ({q0,%; oa .»95}5{.,4, —,0, 1, os .,9},ô, qo, {as }) 


where 6 is defined by the transition table in Fig. 2.20. 


€ Edija 0,1,...,9 


Figure 2.20: Transition table for Fig. 2.18 


2.5.3 Epsilon-Closures 


We shall proceed to give formal definitions of an extended transition function for 
e-NFA’s, which leads to the definition of acceptance of strings and languages by 
these automata, and eventually lets us explain why e-NFA’s can be simulated by 
DFA’s. However, we first need to learn a central definition, called the e-closure 
of a state. Informally, we e-close a state q by following all transitions out of 
q that are labeled e. However, when we get to other states by following e€, we 
follow the e-transitions out of those states, and so on, eventually finding every 
state that can be reached from q along any path whose arcs are all labeled e€. 
Formally, we define the e-closure ECLOSE(q) recursively, as follows: 


BASIS: State q is in ECLOSE(g). 


INDUCTION: If state p is in ECLOSE(q), and there is a transition from state p 
to state r labeled c€, then r is in ECLOSE(q). More precisely, if ô is the transition 
function of the e-NFA involved, and p is in ECLOSE(q), then ECLOSE(q) also 
contains all the states in ô(p, €). 


Example 2.19: For the automaton of Fig. 2.18, each state is its own e-closure, 
with two exceptions: ECLOSE(qo) = {qo,q} and ECLOSE(q3) = {q3,q5}. The 
reason is that there are only two e-transitions, one that adds qı to ECLOSE(qo) 
and the other that adds q5 to ECLOSE(qs). 
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D b 
Noo 


Figure 2.21: Some states and transitions 


A more complex example is given in Fig. 2.21. For this collection of states, 
which may be part of some e-NFA, we can conclude that 


ECLOSE(1) = {1, 2,3, 4,6} 


Each of these states can be reached from state 1 along a path exclusively labeled 
e. For example, state 6 is reached by the path 1 > 2 > 3 > 6. State 7 is not 
in ECLOSE(1), since although it is reachable from state 1, the path must use 
the arc 4 > 5 that is not labeled e. The fact that state 6 is also reached from 
state 1 along a path 1 — 4 > 5 > 6 that has non-e transitions is unimportant. 
The existence of one path with all labels e€ is sufficient to show state 6 is in 
ECLOSE(1). 


We sometimes need to apply the e-closure to a set of states S. We do so my 
taking the union of the e-closures of the individual states; that is, ECLOSE(S) = 


U, in g ECLOSE(q). 


2.5.4 Extended Transitions and Languages for «-NFA’s 


The e-closure allows us to explain easily what the transitions of an e-NFA look 
like when given a sequence of (non-e) inputs. From there, we can define what 
it means for an e-NFA to accept its input. 

Suppose that E = (Q, £, ô, qo, F) is an e-NFA. We first define ô, the extended 
transition function, to reflect what happens on a sequence of inputs. The intent 
is that 4(q, w) is the set of states that can be reached along a path whose labels, 
when concatenated, form the string w. As always, e’s along this path do not 
contribute to w. The appropriate recursive definition of 6 is: 


BASIS: ô(q, €) = ECLOSE(q). That is, if the label of the path is e€, then we can 
follow only e-labeled arcs extending from state q; that is exactly what ECLOSE 
does. 


INDUCTION: Suppose w is of the form xa, where a is the last symbol of w. 
Note that a is a member of ©; it cannot be e, which is not in ©. We compute 


A 


6(q, w) as follows: 


1. Let {p1, p2,..-, pr} be 4(q, x). That is, the p;’s are all and only the states 
that we can reach from q following a path labeled x. This path may end 
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with one or more transitions labeled €, and may have other €-transitions, 
as well. 


2. Let Bex 6(p;, a) be the set {r1,12,..-,1m}. That is, follow all transitions 
labeled a from states we can reach from q along paths labeled x. The 
rj’s are some of the states we can reach from q along paths labeled w. 
The additional states we can reach are found from the r;’s by following 
e-labeled arcs in step (3), below. 


3. Then 6(q,w) = ECLOSE({r1,72,-..,?m}). This additional closure step 
includes all the paths from q labeled w, by considering the possibility 
that there are additional ¢-labeled arcs that we can follow after making a 
transition on the final “real” symbol, a. 


Example 2.20: Let us compute 6(qo,5-6) for the «NFA of Fig. 2.18. A 
summary of the steps needed are as follows: 


e Ô(qo, €) = ECLOSE(qo) = {40,41}. 
e Compute 4(qo,5) as follows: 


1. First compute the transitions on input 5 from the states qo and qı 
that we obtained in the calculation of 6 (qo, €), above. That is, we 
compute 6(qo, 5) U ô(q1, 5) = {q1, a4}. 

2. Next, e-close the members of the set computed in step (1). We get 
ECLOSE(q1 ) U ECLOSE(q4) = {a} U {qa} = {q,q4}. That set is 


A 


ô(qo, 5). This two-step pattern repeats for the next two symbols. 
e Compute 4(q0, 5.) as follows: 


1. First compute 6(q1, -) Ud(qa, -) = {q2} U {q3} = {q2, 93}. 
2. Then compute 


A 


0(qo,5-) = ECLOSE(q2) U ECLOSE(q3) = {q2} U {a3,¢5} = {@2, 93,95} 


e Compute 4(qo,5-6) as follows: 


1. First compute 6(q2,6) U 4(q3,6) U 5(gs,6) = {q3} U {q3} UO = 
{a3}. 


2. Then compute 4(qo,5-6) = ECLOSE(q3) = {q3, q5 }. 


Now, we can define the language of an -NFA E = (Q,»,6,qo, F) in the 
expected way: L(E) = {w | 6(qo,w) N F # Ø}. That is, the language of E is 
the set of strings w that take the start state to at least one accepting state. For 
instance, we saw in Example 2.20 that 4(q0, 5.6) contains the accepting state 
qs, so the string 5.6 is in the language of that e-NFA. 
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2.5.5 Eliminating ¢-Transitions 


Given any e-NFA E, we can find a DFA D that accepts the same language as E. 
The construction we use is very close to the subset construction, as the states of 
D are subsets of the states of Æ. The only difference is that we must incorporate 
e-transitions of E, which we do through the mechanism of the e-closure. 

Let E = (Qg, £, ôE, qo, Fe). Then the equivalent DFA 


D = (Qp, X, ôb, qp, Fp) 
is defined as follows: 


1. Qp is the set of subsets of Qg. More precisely, we shall find that all 
accessible states of D are e-closed subsets of Qg, that is, sets S C QE 
such that S = ECLOSE(S). Put another way, the e-closed sets of states S 
are those such that any €-transition out of one of the states in S leads to 
a state that is also in S. Note that @ is an e-closed set. 


2. qp = ECLOSE(qo); that is, we get the start state of D by closing the set 
consisting of only the start state of E. Note that this rule differs from 
the original subset construction, where the start state of the constructed 
automaton was just the set containing the start state of the given NFA. 


3. Fp is those sets of states that contain at least one accepting state of E. 
That is, Fp = {S| S isin Qp and SN Fg F 0}. 


4. dp(S,a) is computed, for all a in © and sets S in Qp by: 


(a) Let S = {P1, p2, ae ,Pk}- 
(b) Compute UŁ, ôg(pi a); let this set be {r1,1r2,...,7m}.- 
(c) Then dp(S,a) = ECLOSE({r1,12,-.-,1m})- 


Example 2.21: Let us eliminate e-transitions from the e-NFA of Fig. 2.18, 
which we shall call E in what follows. From Æ, we construct an DFA D, which 
is shown in Fig. 2.22. However, to avoid clutter, we omitted from Fig. 2.22 the 
dead state Ý and all transitions to the dead state. You should imagine that for 
each state shown in Fig. 2.22 there are additional transitions from any state to 
on any input symbols for which a transition is not indicated. Also, the state 
Ü has transitions to itself on all input symbols. 

Since the start state of E is qo, the start state of D is ECLOSE(qo), which 
is {qo,q1}. Our first job is to find the successors of go and qı on the various 
symbols in ©; note that these symbols are the plus and minus signs, the dot, 
and the digits 0 through 9. On + and —, qı goes nowhere in Fig. 2.18, while 
go goes to qı. Thus, to compute dp({go,q1}, +) we start with {q1} and e-close 
it. Since there are no etransitions out of qı, we have dp({qo,qi},+) = {q1}. 
Similarly, dp({qo,a},—) = {q1}. These two transitions are shown by one arc 
in Fig. 2.22. 
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0,1,...,9 


Figure 2.22: The DFA D that eliminates €-transitions from Fig. 2.18 


Next, we need to compute dp({qo,q}, -). Since qo goes nowhere on the 
dot, and qı goes to q in Fig. 2.18, we must e-close {q2}. As there are no 
e-transitions out of q2, this state is its own closure, so dp({qo,q}, -) = {q2}. 

Finally, we must compute dp({qo, q},0), as an example of the transitions 
from {q0,qı} on all the digits. We find that go goes nowhere on the digits, but 
qı goes to both qı and q4. Since neither of those states have e-transitions out, 
we conclude dp({qo, a1 },0) = {q1, q4}, and likewise for the other digits. 

We have now explained the arcs out of {qo,qı} in Fig. 2.22. The other 
transitions are computed similarly, and we leave them for you to check. Since 
qs is the only accepting state of E, the accepting states of D are those accessible 
states that contain q5. We see these two sets {q3, qs} and {q2, q3, qs } indicated 
by double circles in Fig. 2.22. 


Theorem 2.22: A language L is accepted by some e-NFA if and only if L is 
accepted by some DFA. 


PROOF: (If) This direction is easy. Suppose L = L(D) for some DFA. Turn 
D into an e NFA E by adding transitions 6(q,¢) = Ø for all states q of D. 
Technically, we must also convert the transitions of D on input symbols, e.g., 
dp(q,a) = p into an NFA-transition to the set containing only p, that is 
dn(q,a) = {p}. Thus, the transitions of E and D are the same, but E ex- 
plicitly states that there are no transitions out of any state on e€. 


(Only-if) Let E = (Qg, X, ôE, qo, Fg) be an e NFA. Apply the modified 
subset construction described above to produce the DFA 


D = (Qp, X£, ôb, qp, Fp) 


We need to show that L(D) = L(E), and we do so by showing that the extended 
transition functions of E and D are the same. Formally, we show ðg(qo, w) = 
dp(qp,w) by induction on the length of w. 
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BASIS: If |w| = 0, then w = e. We know ôp(qo,€) = ECLOSE(qo). We also 
know that gp = ECLOSE(qo), because that is how the start state of D is defined. 
Finally, for a DFA, we know that d(p, €) = p for any state p, so in particular, 
dp(qp,€) = ECLOSE(qq). We have thus proved that ôs (qo, ©) = dp(qp,©)- 


INDUCTION: Suppose w = xa, where a is the final symbol of w, and assume 
that the statement holds for x. That is, îe (qo, xr) = édn(qn, x). Let both these 
sets of states be {p1, po,..., Pk }- 

By the definition of 6 for e NFA’s, we compute dn (qo, w) by: 


1. Let {r1,172,...,1m} be UŁ, On (pi, a). 
2. Then dn (qo, w) = ECLOSE({r1,12,---,1m}). 


If we examine the construction of DFA D in the modified subset construction 
above, we see that dp({pi, p2,- . -, pk}, a) is constructed by the same two steps 
(1) and (2) above. Thus, édp(qp,w), which is dp({p1, p2,. .-, pk },a) is the same 
set as ôE (qo, w). We have now proved that ôr (qo, w) = dp(qp, w) and completed 
the inductive part. 


2.5.6 Exercises for Section 2.5 
* Exercise 2.5.1: Consider the following e-NFA. 


a) Compute the e-closure of each state. 
b) Give all the strings of length three or less accepted by the automaton. 
c) Convert the automaton to a DFA. 


Exercise 2.5.2: Repeat Exercise 2.5.1 for the following e-NFA: 


Exercise 2.5.3: Design e-NFA’s for the following languages. Try to use €- 
transitions to simplify your design. 


a) The set of strings consisting of zero or more a’s followed by zero or more 
b’s, followed by zero or more c’s. 
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The set of strings that consist of either 01 repeated one or more times or 
010 repeated one or more times. 


The set of strings of 0’s and 1’s such that at least one of the last ten 
positions is a 1. 


Summary of Chapter 2 


Deterministic Finite Automata: A DFA has a finite set of states and a 
finite set of input symbols. One state is designated the start state, and 
zero or more states are accepting states. A transition function determines 
how the state changes each time an input symbol is processed. 


Transition Diagrams: It is convenient to represent automata by a graph 
in which the nodes are the states, and arcs are labeled by input symbols, 
indicating the transitions of that automaton. The start state is designated 
by an arrow, and the accepting states by double circles. 


Language of an Automaton: The automaton accepts strings. A string is 
accepted if, starting in the start state, the transitions caused by processing 
the symbols of that string one-at-a-time lead to an accepting state. In 
terms of the transition diagram, a string is accepted if it is the label of a 
path from the start state to some accepting state. 


Nondeterministic Finite Automata: The NFA differs from the DFA in 
that the NFA can have any number of transitions (including zero) to next 
states from a given state on a given input symbol. 


The Subset Construction: By treating sets of states of an NFA as states 
of a DFA, it is possible to convert any NFA to a DFA that accepts the 
same language. 


€- Transitions: We can extend the NFA by allowing transitions on an 
empty input, i.e., no input symbol at all. These extended NFA’s can be 
converted to DFA’s accepting the same language. 


Text-Searching Applications: Nondeterministic finite automata are a use- 
ful way to represent a pattern matcher that scans a large body of text for 
one or more keywords. These automata are either simulated directly in 
software or are first converted to a DFA, which is then simulated. 


2.7 Gradiance Problems for Chapter 2 


The following is a sample of problems that are available on-line through the 
Gradiance system at www.gradiance.com/pearson. Each of these problems 
is worked like conventional homework. The Gradiance system gives you four 
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choices that sample your knowledge of the solution. If you make the wrong 
choice, you are given a hint or advice and encouraged to try the same problem 
again. 


Problem 2.1: Examine the following DFA [shown on-line by the Gradiance 
system]. Identify in the list below the string that this automaton accepts. 


Problem 2.2: The finite automaton below [shown on-line by the Gradiance 
system] accepts no word of length zero, no word of length one, and only two 
words of length two (01 and 10). There is a fairly simple recurrence equation for 
the number N(k) of words of length k that this automaton accepts. Discover 
this recurrence and demonstrate your understanding by identifying the correct 
value of N(k) for some particular k. Note: the recurrence does not have an 
easy-to-use closed form, so you will have to compute the first few values by 
hand. You do not have to compute N(k) for any k greater than 14. 


Problem 2.3: Here is the transition function of a simple, deterministic au- 
tomaton with start state A and accepting state B: 


0 1 
A|A B 
B\|B A 


We want to show that this automaton accepts exactly those strings with an odd 
number of 1’s, or more formally: 


6(A,w) = B if and only if w has an odd number of 1’s. 


Here, ô is the extended transition function of the automaton; that is, 6(A, w) 
is the state that the automaton is in after processing input string w The proof 
of the statement above is an induction on the length of w. Below, we give the 
proof with reasons missing. You must give a reason for each step, and then 
demonstrate your understanding of the proof by classifying your reasons into 
the following three categories: 


A) Use of the inductive hypothesis. 


B) Reasoning about properties of deterministic finite automata, e.g., that if 
string s = yz, then 6(q,s) = 6(6(q,y), z). 


C) Reasoning about properties of binary strings (strings of 0’s and 1’s), e.g., 
that every string is longer than any of its proper substrings. 


Basis (|w| = 0): 
1. w =e because: 


2. (A, €) = A because: 
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3. € has an even number of 0’s because: 
Induction (|w| =n > 0) 

4. There are two cases: (a) when w = x1 and (b) when w = 20 because: 
Case (a): 


5. In case (a), w has an odd number of 1’s if and only if z has an even 
number of 1’s because: 


6. In case (a), 6(A, x) = A if and only if w has an odd number of 1’s because: 
7. In case (a), 6(A, w) = B if and only if w has an odd number of 1’s because: 
Case (b): 


8. In case (b), w has an odd number of 1’s if and only if z has an odd number 
of 1’ because: 


9. In case (b), 6(A, x) = B if and only if w has an odd number of 1’s because: 
10. In case (b), 6(A,w) = B if and only if w has an odd number of 1’s because: 


Problem 2.4: Convert the following nondeterministic finite automaton [shown 
on-line by the Gradiance system] to a DFA, including the dead state, if neces- 
sary. Which of the following sets of NFA states is not a state of the DFA that 
is accessible from the start state of the DFA? 


Problem 2.5: The following nondeterministic finite automaton [shown on-line 
by the Gradiance system] accepts which of the following strings? 


Problem 2.6: Here is a nondeterministic finite automaton with epsilon-trans- 
itions [shown on-line by the Gradiance system]. Suppose we use the extended 
subset construction from Section 2.5.5 to convert this epsilon-NFA to a deter- 
ministic finite automaton with a dead state, with all transitions defined, and 
with no state that is inaccessible from the start state. Which of the following 
would be a transition of the DFA? 


Problem 2.7: Here is an epsilon-NFA [shown on-line by the Gradiance sys- 
tem]. Suppose we construct an equivalent DFA by the construction of Section 
2.5.5. That is, start with the epsilon-closure of the start state A. For each set of 
states S we construct (which becomes one state of the DFA), look at the tran- 
sitions from this set of states on input symbol 0. See where those transitions 
lead, and take the union of the epsilon-closures of all the states reached on 0. 
This set of states becomes a state of the DFA. Do the same for the transitions 
out of S on input 1. When we have found all the sets of epsilon-NFA states 
that are constructed in this way, we have the DFA and its transitions. Carry 
out this construction of a DFA, and identify one of the states of this DFA (as 
a subset of the epsilon-NFA’s states) from the list below. 
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Problem 2.8: Identify which automata [in a set of diagrams shown on-line 
by the Gradiance system] define the same language and provide the correct 
counterexample if they don’t. Choose the correct statement from the list below. 


Problem 2.9: Examine the following DFA [shown on-line by the Gradiance 
system]. This DFA accepts a certain language L. In this problem we shall 
consider certain other languages that are defined by their tails, that is, languages 
of the form (0 + 1) x w, for some particular string w of 0’s and 1’s. Call this 
language L(w). Depending on w, L(w) may be contained in L, disjoint from L, 
or neither contained nor disjoint from L (i.e., some strings of the form xw are 
in L and others are not). Your problem is to find a way to classify w into one of 
these three cases. Then, use your knowledge to classify the following languages: 


1. £(1111001), i.e., the language of regular expression (0 + 1) x 1111001. 
£(11011), i.e., the language of regular expression (0 + 1) x 11011. 
. £(110101), i.e., the language of regular expression (0 + 1) x 110101. 

( 


. £(00011101), i.e., the language of regular expression (0 + 1) x 00011101. 


Problem 2.10: Here is a nondeterministic finite automaton [shown on-line by 
the Gradiance system]. Convert this NFA to a DFA, using the “lazy” version of 
the subset construction described in Section 2.3.5, so only the accessible states 
are constructed. Which of the following sets of NFA states becomes a state of 
the DFA? 


Problem 2.11: Here is a nondeterministic finite automaton [shown on-line by 
the Gradiance system]. Some input strings lead to more than one state. Find, 
in the list below, a string that leads from the start state A to three different 
states (possibly including A). 


2.8 References for Chapter 2 
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Chapter 3 


Regular Expressions and 
Languages 


We begin this chapter by introducing the notation called “regular expressions.” 
These expressions are another type of language-defining notation, which we 
sampled briefly in Section 1.1.2. Regular expressions also may be thought of as 
a “programming language,” in which we express some important applications, 
such as text-search applications or compiler components. Regular expressions 
are closely related to nondeterministic finite automata and can be thought of 
as a “user-friendly” alternative to the NFA notation for describing software 
components. 

In this chapter, after defining regular expressions, we show that they are 
capable of defining all and only the regular languages. We discuss the way 
that regular expressions are used in several software systems. Then, we exam- 
ine the algebraic laws that apply to regular expressions. They have significant 
resemblance to the algebraic laws of arithmetic, yet there are also some im- 
portant differences between the algebras of regular expressions and arithmetic 
expressions. 


3.1 Regular Expressions 


Now, we switch our attention from machine-like descriptions of languages — 
deterministic and nondeterministic finite automata — to an algebraic descrip- 
tion: the “regular expression.” We shall find that regular expressions can define 
exactly the same languages that the various forms of automata describe: the 
regular languages. However, regular expressions offer something that automata 
do not: a declarative way to express the strings we want to accept. Thus, 
regular expressions serve as the input language for many systems that process 
strings. Examples include: 
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1. Search commands such as the UNIX grep or equivalent commands for 
finding strings that one sees in Web browsers or text-formatting systems. 
These systems use a regular-expression-like notation for describing pat- 
terns that the user wants to find in a file. Different search systems convert 
the regular expression into either a DFA or an NFA, and simulate that 
automaton on the file being searched. 


2. Lexical-analyzer generators, such as Lex or Flex. Recall that a lexical 
analyzer is the component of a compiler that breaks the source program 
into logical units (called tokens) of one or more characters that have a 
shared significance. Examples of tokens include keywords (e.g., while), 
identifiers (e.g., any letter followed by zero or more letters and/or digits), 
and signs, such as + or <=. A lexical-analyzer generator accepts descrip- 
tions of the forms of tokens, which are essentially regular expressions, and 
produces a DFA that recognizes which token appears next on the input. 


3.1.1 The Operators of Regular Expressions 


Regular expressions denote languages. For a simple example, the regular ex- 
pression 01* + 10* denotes the language consisting of all strings that are either 
a single 0 followed by any number of 1’s or a single 1 followed by any number 
of 0’s. We do not expect you to know at this point how to interpret regular 
expressions, so our statement about the language of this expression must be 
accepted on faith for the moment. We shortly shall define all the symbols used 
in this expression, so you can see why our interpretation of this regular expres- 
sion is the correct one. Before describing the regular-expression notation, we 
need to learn the three operations on languages that the operators of regular 
expressions represent. These operations are: 


1. The union of two languages L and M, denoted L U M, is the set of strings 
that are in either L or M, or both. For example, if L = {001, 10, 111} and 
M = {e,001}, then LU M = {e, 10,001, 111}. 


2. The concatenation of languages L and M is the set of strings that can 
be formed by taking any string in L and concatenating it with any string 
in M. Recall Section 1.5.2, where we defined the concatenation of a 
pair of strings; one string is followed by the other to form the result of the 
concatenation. We denote concatenation of languages either with a dot or 
with no operator at all, although the concatenation operator is frequently 
called “dot.” For example, if L = {001,10,111} and M = {e, 001}, then 
L.M, or just DM, is {001, 10, 111,001001, 10001, 111001}. The first three 
strings in LM are the strings in L concatenated with e. Since e is the 
identity for concatenation, the resulting strings are the same as the strings 
of L. However, the last three strings in LM are formed by taking each 
string in L and concatenating it with the second string in M, which is 
001. For instance, 10 from L concatenated with 001 from M gives us 
10001 for LM. 
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3. The closure (or star, or Kleene closure)! of a language L is denoted L* 


and represents the set of those strings that can be formed by taking any 
number of strings from L, possibly with repetitions (i.e., the same string 
may be selected more than once) and concatenating all of them. For 
instance, if L = {0,1}, then L* is all strings of 0’s and 1’s. If L = {0,11}, 
then L* consists of those strings of 0’s and 1’s such that the 1’s come in 
pairs, e.g., 011, 11110, and €, but not 01011 or 101. More formally, L* is 
the infinite union U;>o Lf, where L° = {e}, L! = L, and Lf, for i > 1 is 
LL---L (the concatenation of i copies of L). 


Example 3.1: Since the idea of the closure of a language is somewhat tricky, 
let us study a few examples. First, let L = {0,11}. L° = {e}, independent of 
what language L is; the Oth power represents the selection of zero strings from 
L. L! = L, which represents the choice of one string from L. Thus, the first 
two terms in the expansion of L* give us {e,0, 11}. 

Next, consider L?. We pick two strings from L, with repetitions allowed, so 
there are four choices. These four selections give us L? = {00,011, 110, 1111}. 
Similarly, L3 is the set of strings that may be formed by making three choices 
of the two strings in L and gives us 


{000, 0011, 0110, 1100, 01111, 11011, 11110, 111111} 


To compute L*, we must compute L’ for each i, and take the union of all these 
languages. Lf has 2f members. Although each L’ is finite, the union of the 
infinite number of terms L’ is generally an infinite language, as it is in our 
example. 

Now, let L be the set of all strings of 0’s. Note that L is infinite, unlike 
our previous example, which is a finite language. However, it is not hard to 
discover what L* is. L° = {e}, as always. L! = L. L? is the set of strings that 
can be formed by taking one string of 0’s and concatenating it with another 
string of 0’s. The result is still a string of 0’s. In fact, every string of 0’s 
can be written as the concatenation of two strings of 0’s (don’t forget that e€ 
is a “string of 0’s”; this string can always be one of the two strings that we 
concatenate). Thus, L? = L. Likewise, L? = L, and so on. Thus, the infinite 
union L* = L? U L! U L? U -+ is Lin the particular case that the language L 
is the set of all strings of 0’s. 

For a final example, @* = {e}. Note that 0° = {e}, while 0’, for any i > 1, 
is empty, since we can’t select any strings from the empty set. In fact, @ is one 
of only two languages whose closure is not infinite. 


3.1.2 Building Regular Expressions 


Algebras of all kinds start with some elementary expressions, usually constants 
and/or variables. Algebras then allow us to construct more expressions by 


1The term “Kleene closure” refers to S. C. Kleene, who originated the regular expression 
notation and this operator. 
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Use of the Star Operator 


We saw the star operator first in Section 1.5.2, where we applied it to an 
alphabet, e.g., ©*. That operator formed all strings whose symbols were 
chosen from alphabet ©. The closure operator is essentially the same, 
although there is a subtle distinction of types. 

Suppose L is the language containing strings of length 1, and for each 
symbol a in È there is a string a in L. Then, although L and © “look” 
the same, they are of different types; L is a set of strings, and © is a set 
of symbols. On the other hand, L* denotes the same language as b*. 


applying a certain set of operators to these elementary expressions and to pre- 
viously constructed expressions. Usually, some method of grouping operators 
with their operands, such as parentheses, is required as well. For instance, 
the familiar arithmetic algebra starts with constants such as integers and real 
numbers, plus variables, and builds more complex expressions with arithmetic 
operators such as + and x. 

The algebra of regular expressions follows this pattern, using constants and 
variables that denote languages, and operators for the three operations of Sec- 
tion 3.1.1 —union, dot, and star. We can describe the regular expressions 
recursively, as follows. In this definition, we not only describe what the le- 
gal regular expressions are, but for each regular expression E, we describe the 
language it represents, which we denote L(E). 


BASIS: The basis consists of three parts: 


1. The constants € and @ are regular expressions, denoting the languages {e} 
and @, respectively. That is, L(e) = {e}, and L(Q) = 9. 


2. If ais any symbol, then a is a regular expression. This expression denotes 
the language {a}. That is, L(a) = {a}. Note that we use boldface font 
to denote an expression corresponding to a symbol. The correspondence, 
e.g. that a refers to a, should be obvious. 


3. A variable, usually capitalized and italic such as L, is a variable, repre- 
senting any language. 


INDUCTION: There are four parts to the inductive step, one for each of the 
three operators and one for the introduction of parentheses. 


1. If E and F are regular expressions, then E + F is a regular expression 
denoting the union of L(E) and L(F). That is, L(E+ F) = L(E) U L(F). 


2. If E and F are regular expressions, then EF is a regular expression denot- 
ing the concatenation of L(E) and L(F). That is, L(EF) = L(E)L(F). 
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Expressions and Their Languages 


Strictly speaking, a regular expression E is just an expression, not a lan- 
guage. We should use L(E) when we want to refer to the language that Æ 
denotes. However, it is common usage to refer to say “E” when we really 
mean “L(E).” We shall use this convention as long as it is clear we are 
talking about a language and not about a regular expression. 


Note that the dot can optionally be used to denote the concatenation op- 
erator, either as an operation on languages or as the operator in a regular 
expression. For instance, 0.1 is a regular expression meaning the same as 
01 and representing the language {01}. However, we shall avoid the dot 
as concatenation in regular expressions.” 


3. If E is a regular expression, then E* is a regular expression, denoting the 
closure of L(E). That is, L(E*) = (L(E))”. 


4. If E is a regular expression, then (E), a parenthesized £, is also a regular 
expression, denoting the same language as E. Formally; L((E)) = L(E). 


Example 3.2: Let us write a regular expression for the set of strings that 
consist of alternating 0’s and 1’s. First, let us develop a regular expression 
for the language consisting of the single string 01. We can then use the star 
operator to get an expression for all strings of the form 0101---01. 

The basis rule for regular expressions tells us that 0 and 1 are expressions 
denoting the languages {0} and {1}, respectively. If we concatenate the two 
expressions, we get a regular expression for the language {01}; this expression is 
01. As a general rule, if we want a regular expression for the language consisting 
of only the string w, we use w itself as the regular expression. Note that in the 
regular expression, the symbols of w will normally be written in boldface, but 
the change of font is only to help you distinguish expressions from strings and 
should not be taken as significant. 

Now, to get all strings consisting of zero or more occurrences of 01, we use 
the regular expression (01)*. Note that we first put parentheses around 01, to 
avoid confusing with the expression 01*, whose language is all strings consisting 
of a 0 and any number of 1’s. The reason for this interpretation is explained 
in Section 3.1.3, but briefly, star takes precedence over dot, and therefore the 
argument of the star is selected before performing any concatenations. 

However, L((01)*) is not exactly the language that we want. It includes 
only those strings of alternating 0’s and 1’s that begin with 0 and end with 1. 
We also need to consider the possibility that there is a 1 at the beginning and/or 


2In fact, UNIX regular expressions use the dot for an entirely different purpose: represent- 
ing any ASCII character. 
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a0 at the end. One approach is to construct three more regular expressions that 
handle the other three possibilities. That is, (10)* represents those alternating 
strings that begin with 1 and end with 0, while 0(10)* can be used for strings 
that both begin and end with 0 and 1(01)* serves for strings that begin and 
end with 1. The entire regular expression is 


(01)* + (10)* + 0(10)* + 1(01)* 


Notice that we use the + operator to take the union of the four languages that 
together give us all the strings with alternating 0’s and 1’s. 

However, there is another approach that yields a regular expression that 
looks rather different and is also somewhat more succinct. Start again with the 
expression (01)*. We can add an optional 1 at the beginning if we concatenate 
on the left with the expression e + 1. Likewise, we add an optional 0 at the end 
with the expression € + 0. For instance, using the definition of the + operator: 


L(e+1) = L(e) U L(A) = {e} U {1} = {e, 1} 


If we concatenate this language with any other language L, the e choice gives 
us all the strings in L, while the 1 choice gives us 1w for every string w in L. 
Thus, another expression for the set of strings that alternate 0’s and 1’s is: 


(e+ 1)(01)* (e+ 0) 


Note that we need parentheses around each of the added expressions, to make 
sure the operators group properly. 


3.1.3 Precedence of Regular-Expression Operators 


Like other algebras, the regular-expression operators have an assumed order of 
“precedence,” which means that operators are associated with their operands in 
a particular order. We are familiar with the notion of precedence from ordinary 
arithmetic expressions. For instance, we know that xy+z groups the product ry 
before the sum, so it is equivalent to the parenthesized expression (ay) + z and 
not to the expression z(y + z). Similarly, we group two of the same operators 
from the left in arithmetic, so x — y — z is equivalent to (x — y) — z, and not to 
x —(y—2z). For regular expressions, the following is the order of precedence for 
the operators: 


1. The star operator is of highest precedence. That is, it applies only to 
the smallest sequence of symbols to its left that is a well-formed regular 
expression. 


2. Next in precedence comes the concatenation or “dot” operator. After 
grouping all stars to their operands, we group concatenation operators 
to their operands. That is, all expressions that are juxtaposed (adjacent, 
with no intervening operator) are grouped together. Since concatenation 
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is an associative operator it does not matter in what order we group 
consecutive concatenations, although if there is a choice to be made, you 
should group them from the left. For instance, 012 is grouped (01)2. 


3. Finally, all unions (+ operators) are grouped with their operands. Since 
union is also associative, it again matters little in which order consecutive 
unions are grouped, but we shall assume grouping from the left. 


Of course, sometimes we do not want the grouping in a regular expression 
to be as required by the precedence of the operators. If so, we are free to use 
parentheses to group operands exactly as we choose. In addition, there is never 
anything wrong with putting parentheses around operands that you want to 
group, even if the desired grouping is implied by the rules of precedence. 


Example 3.3: The expression 01* + 1 is grouped (0(1*)) + 1. The star 
operator is grouped first. Since the symbol 1 immediately to its left is a legal 
regular expression, that alone is the operand of the star. Next, we group the 
concatenation between 0 and (1*), giving us the expression (0(1*)). Finally, 
the union operator connects the latter expression and the expression to its right, 
which is 1. 

Notice that the language of the given expression, grouped according to the 
precedence rules, is the string 1 plus all strings consisting of a 0 followed by any 
number of 1’s (including none). Had we chosen to group the dot before the star, 
we could have used parentheses, as (01)* +1. The language of this expression 
is the string 1 and all strings that repeat 01, zero or more times. Had we wished 
to group the union first, we could have added parentheses around the union to 
make the expression 0(1* + 1). That expression’s language is the set of strings 
that begin with 0 and have any number of 1’s following. 


3.1.4 Exercises for Section 3.1 


Exercise 3.1.1: Write regular expressions for the following languages: 


* a) The set of strings over alphabet {a, b,c} containing at least one a and at 
least one b. 


b) The set of strings of 0’s and 1’s whose tenth symbol from the right end is 
1. 


c) The set of strings of 0’s and 1’s with at most one pair of consecutive 1’s. 


Exercise 3.1.2: Write regular expressions for the following languages: 


* a) The set of all strings of 0’s and 1’s such that every pair of adjacent 0’s 
appears before any pair of adjacent 1’s. 


b) The set of strings of 0’s and 1’s whose number of 0’s is divisible by five. 
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!! Exercise 3.1.3: Write regular expressions for the following languages: 
a) The set of all strings of 0’s and 1’s not containing 101 as a substring. 


b) The set of all strings with an equal number of 0’s and 1’s, such that no 
prefix has two more 0’s than 1’s, nor two more 1’s than 0’s. 


c) The set of strings of 0’s and 1’s whose number of 0’s is divisible by five 
and whose number of 1’s is even. 


! Exercise 3.1.4: Give English descriptions of the languages of the following 
regular expressions: 


* a) (1+ €)(00*1)*0*. 
b) (0*1*)*000(0 + 1)*. 
c) (0+ 10)*1*. 


*! Exercise 3.1.5: In Example 3.1 we pointed out that Ú is one of two languages 
whose closure is finite. What is the other? 


3.2 Finite Automata and Regular Expressions 


While the regular-expression approach to describing languages is fundamentally 
different from the finite-automaton approach, these two notations turn out to 
represent exactly the same set of languages, which we have termed the “reg- 
ular languages.” We have already shown that deterministic finite automata, 
and the two kinds of nondeterministic finite automata — with and without 
e-transitions — accept the same class of languages. In order to show that the 
regular expressions define the same class, we must show that: 


1. Every language defined by one of these automata is also defined by a 
regular expression. For this proof, we can assume the language is accepted 
by some DFA. 


2. Every language defined by a regular expression is defined by one of these 
automata. For this part of the proof, the easiest is to show that there is 
an NFA with e-transitions accepting the same language. 


Figure 3.1 shows all the equivalences we have proved or will prove. An arc from 
class X to class Y means that we prove every language defined by class X is 
also defined by class Y. Since the graph is strongly connected (i.e., we can get 
from each of the four nodes to any other node) we see that all four classes are 
really the same. 
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Figure 3.1: Plan for showing the equivalence of four different notations for 
regular languages 


3.2.1 From DFA’s to Regular Expressions 


The construction of a regular expression to define the language of any DFA is 
surprisingly tricky. Roughly, we build expressions that describe sets of strings 
that label certain paths in the DFA’s transition diagram. However, the paths 
are allowed to pass through only a limited subset of the states. In an inductive 
definition of these expressions, we start with the simplest expressions that de- 
scribe paths that are not allowed to pass through any states (i.e., they are single 
nodes or single arcs), and inductively build the expressions that let the paths 
go through progressively larger sets of states. Finally, the paths are allowed to 
go through any state; i.e., the expressions we generate at the end represent all 
possible paths. These ideas appear in the proof of the following theorem. 


Theorem 3.4: If L = L(A) for some DFA A, then there is a regular expression 
R such that L = L(R). 


PROOF: Let us suppose that A’s states are {1,2,...,n} for some integer n. No 
matter what the states of A actually are, there will be n of them for some finite 
n, and by renaming the states, we can refer to the states in this manner, as if 
they were the first n positive integers. Our first, and most difficult, task is to 
construct a collection of regular expressions that describe progressively broader 
sets of paths in the transition diagram of A. 

Let us use Re as the name of a regular expression whose language is the 
set of strings w such that w is the label of a path from state i to state j in A, 
and that path has no intermediate node whose number is greater than k. Note 
that the beginning and end points of the path are not “intermediate,” so there 
is no constraint that i and/or j be less than or equal to k. 


Figure 3.2 suggests the requirement on the paths represented by a . There, 
the vertical dimension represents the state, from 1 at the bottom to n at the 
top, and the horizontal dimension represents travel along the path. Notice that 
in this diagram we have shown both i and j to be greater than k, but either or 
both could be k or less. Also notice that the path passes through node k twice, 
but never goes through a state me ic than k, except at the endpoints. 

k 


To construct the expressions R;;’, we use the following inductive definition, 


starting at k = 0 and finally reaching k = n. Notice that when k = n, there is 
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Figure 3.2: A path whose label is in the language of regular expression RO 


no restriction at all on the paths represented, since there are no states greater 
than n. 


BASIS: The basis is k = 0. Since all states are numbered 1 or above, the 
restriction on paths is that the path must have no intermediate states at all. 
There are only two kinds of paths that meet such a condition: 


1. An arc from node (state) 7 to node j. 


2. A path of length 0 that consists of only some node i. 


If i Æ j, then only case (1) is possible. We must examine the DFA A and 
find those input symbols a such that there is a transition from state i to state 
j on symbol a. 


a) If there is no such symbol a, then p? =o). 


b) If there is exactly one such symbol a, then Ri =a. 


c) If there are symbols a1,a2,...,a, that label arcs from state i to state 7, 
then Re =ajytaot---t+ag. 


However, if i = j, then the legal paths are the path of length 0 and all loops 
from 7 to itself. The path of length 0 is represented by the regular expression 
€, since that path has no symbols along it. Thus, we add e€ to the various 
expressions devised in (a) through (c) above. That is, in case (a) [no symbol a] 
the expression becomes e€, in case (b) [one symbol a] the expression becomes €+a, 
and in case (c) [multiple symbols] the expression becomes €+ a) + az +:+++ax. 


INDUCTION: Suppose there is a path from state i to state j that goes through 
no state higher than k. There are two possible cases to consider: 


1. The path does not go through state k at all. In this case, the label of the 
path is in the language of Ree 
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2. The path goes through state k at least once. Then we can break the path 
into several pieces, as suggested by Fig. 3.3. The first goes from state 
i to state k without passing through k, the last piece goes from k to j 
without passing through k, and all the pieces in the middle go from k 
to itself, without passing through k. Note that if the path goes through 
state k only once, then there are no “middle” pieces, just a path from i 
to k and a path from k to j. The set of labels for all paths of this type 
is represented by the regular expression RÉ D (RÈ DERG D. That is, 
the first expression represents the part of the path that gets to state k 
the first time, the second represents the portion that goes from k to itself, 
zero times, once, or more than once, and the third expression represents 
the part of the path that leaves k for the last time and goes to state j. 


iiam E a ie a 
Ee ee ee 
(k-1) (k-1) 
In Ry In Ry 


Zero or more strings in R aes ) 


Figure 3.3: A path from 7 to j can be broken into segments at each point where 
it goes through state k 


When we combine the expressions for the paths of the two types above, we 
have the expression 


RG = RG) + RO) (Rye) Ray 
for the labels of all paths from state i to state j that go through no state higher 
than k. If we construct these expressions in order of increasing superscript, 
then since each Re depends only on expressions with a smaller superscript, 
then all expressions are available when we need them. 

Eventually, we have RO for alli and 7. We may assume that state 1 is the 
start state, although the accepting states could be any set of the states. The 


regular expression for the language of the automaton is then the sum (union) 
) 


of all expressions RY such that state j is an accepting state. 


Example 3.5: Let us convert the DFA of Fig. 3.4 to a regular expression. 

This DFA accepts all strings that have at least one 0 in them. To see why, note 

that the automaton goes from the start state 1 to accepting state 2 as soon as 

it sees an input 0. The automaton then stays in state 2 on all input sequences. 
Below are the basis expressions in the construction of Theorem 3.4. 
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1 
Start 0 Op 0, 1 


Figure 3.4: A DFA accepting all strings that have at least one 0 


For instance, RO has the term € because the beginning and ending states are 
the same, state 1. It has the term 1 because there is an arc from state 1 to state 
1 on input 1. As another example, RO is 0 because there is an arc labeled 0 
from state 1 to state 2. There is no e term because the beginning and ending 
states are different. For a third example, RO = Í, because there is no arc from 
state 2 to state 1. 

Now, we must do the induction part, building more complex expressions 
that first take into account paths that go through state 1, and then paths that 
can go through states 1 and 2, i.e., any path. The rule for computing the 
expressions E are instances of the general rule given in the inductive part of 
Theorem 3.4: 


Rip = Ri + Rip (Riy) Ri (3.1) 


The table in Fig. 3.5 gives first the expressions computed by direct substitution 
into the above formula, and then a simplified expression that we can show, by 
ad-hoc reasoning, to represent the same language as the more complex expres- 
sion. 


By direct substitution Simplified 


Riy | e+14+ (e+ 1)(e4+1)*(e4+ 1) | 1* 


RY | 0+ (e+ 1(e+1)*0 1*0 
RSD | 0+0e+1)*(e+1) 0 
RY) | e+04140(e+ 1)*0 «+041 


Figure 3.5: Regular expressions for paths that can go through only state 1 


For example, consider RY. Its expression is R® + RO (RORO, which 
we get from (3.1) by substituting i = 1 and j = 2. 

To understand the simplification, note the general principle that if R is any 
regular expression, then (e + R)* = R*. The justification is that both sides of 
the equation describe the language consisting of any concatenation of zero or 
more strings from L(R). In our case, we have (e + 1)* = 1*; notice that both 
expressions denote any number of 1’s. Further, (e+1)1* = 1*. Again, it can be 
observed that both expressions denote “any number of 1’s.” Thus, the original 
expression RË is equivalent to 0 + 1*0. This expression denotes the language 
containing the string 0 and all strings consisting of a 0 preceded by any number 
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of 1’s. This language is also expressed by the simpler expression 1*0. 


The simplification of RY is similar to the simplification of RG) that we just 


considered. The simplification of RY) and RY depends on two rules about 
how @ operates. For any regular expression R: 


1. ØR = RO = ý. That is, Ø is an annihilator for concatenation; it results in 
itself when concatenated, either on the left or right, with any expression. 
This rule makes sense, because for a string to be in the result of a concate- 
nation, we must find strings from both arguments of the concatenation. 
Whenever one of the arguments is Ø, it will be impossible to find a string 
from that argument. 


2.0+R=R+0=R. That is, @ is the identity for union; it results in the 
other expression whenever it appears in a union. 


As a result, an expression like Ø(e + 1)*(€ + 1) can be replaced by Ø. The last 


two simplifications should now be clear. 


) 


Now, let us compute the expressions RS . The inductive rule applied with 


k = 2 gives us: 


RG) = RG + Ri (R9 )" R3 (3.2) 


If we substitute the simplified expressions from Fig. 3.5 into (3.2), we get the 
expressions of Fig. 3.6. That figure also shows simplifications following the same 
principles that we described for Fig. 3.5. 


By direct substitution Simplified 


RY | 1*+1*0(e+0+1)*0 1* 
RË | 1*0 +1*0(e+0+1)*(e+0+1) 1*0(0 + 1)* 
RY | 0+(e+0+1)(e+0+1)*0 0 


RO) | e+0+1+(e+0+1)(e+0+1)*(e+0+1) | (0+1)* 


Figure 3.6: Regular expressions for paths that can go through any state 


The final regular expression equivalent to the automaton of Fig. 3.4 is con- 
structed by taking the union of all the expressions where the first state is the 
start state and the second state is accepting. In this example, with 1 as the 
start state and 2 as the only accepting state, we need only the expression RY). 
This expression is 1*0(0 + 1)*. It is simple to interpret this expression. Its 
language consists of all strings that begin with zero or more 1’s, then have a 0, 
and then any string of 0’s and 1’s. Put another way, the language is all strings 
of 0’s and 1’s with at least one 0. 
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3.2.2 Converting DFA’s to Regular Expressions by 
Eliminating States 


The method of Section 3.2.1 for converting a DFA to a regular expression al- 
ways works. In fact, as you may have noticed, it doesn’t really depend on the 
automaton being deterministic, and could just as well have been applied to an 
NFA or even an e-NFA. However, the construction of the regular expression 
is expensive. Not only do we have to construct about n? expressions for an 
n-state automaton, but the length of the expression can grow by a factor of 4 
on the average, with each of the n inductive steps, if there is no simplification 
of the expressions. Thus, the expressions themselves could reach on the order 
of 4” symbols. 

There is a similar approach that avoids duplicating work at some points. 
For example, for every i and j, the formula for Re in the construction of The- 


orem 3.4 uses the subexpression (RBD )*; the work of writing that expression 


is therefore repeated n? times. 

The approach to constructing regular expressions that we shall now learn 
involves eliminating states. When we eliminate a state s, all the paths that went 
through s no longer exist in the automaton. If the language of the automaton 
is not to change, we must include, on an arc that goes directly from q to p, 
the labels of paths that went from some state q to state p, through s. Since 
the label of this arc may now involve strings, rather than single symbols, and 
there may even be an infinite number of such strings, we cannot simply list the 
strings as a label. Fortunately, there is a simple, finite way to represent all such 
strings: use a regular expression. 

Thus, we are led to consider automata that have regular expressions as 
labels. The language of the automaton is the union over all paths from the 
start state to an accepting state of the language formed by concatenating the 
languages of the regular expressions along that path. Note that this rule is 
consistent with the definition of the language for any of the varieties of automata 
we have considered so far. Each symbol a, or e€ if it is allowed, can be thought 
of as a regular expression whose language is a single string, either {a} or {e}. 
We may regard this observation as the basis of a state-elimination procedure, 
which we describe next. 

Figure 3.7 shows a generic state s about to be eliminated. We suppose that 
the automaton of which s is a state has predecessor states q1,q2,---,q% for s 
and successor states pı, p2,...,pPm for s. It is possible that some of the q’s are 
also p’s, but we assume that s is not among the q’s or p’s, even if there is a loop 
from s to itself, as suggested by Fig. 3.7. We also show a regular expression on 
each arc from one of the q’s to s; expression Q; labels the arc from q;. Likewise, 
we show a regular expression F; labeling the arc from s to p;, for all i. We show 
a loop on s with label S. Finally, there is a regular expression Rij on the arc 
from q; to pj, for all i and j. Note that some of these arcs may not exist in the 
automaton, in which case we take the expression on that arc to be @. 

Figure 3.8 shows what happens when we eliminate state s. All arcs involving 
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Figure 3.7: A state s about to be eliminated 


state s are deleted. To compensate, we introduce, for each predecessor q; of s 
and each successor p; of s, a regular expression that represents all the paths 
that start at qi, go to s, perhaps loop around s zero or more times, and finally 
go to pj. The expression for these paths is Q;S*P,;. This expression is added 
(with the union operator) to the arc from q; to p;. If there was no arc qi > pj, 
then first introduce one with regular expression @. 

The strategy for constructing a regular expression from a finite automaton 
is as follows: 


1. For each accepting state g, apply the above reduction process to pro- 
duce an equivalent automaton with regular-expression labels on the arcs. 
Eliminate all states except q and the start state qo. 


2. If q Æ qo, then we shall be left with a two-state automaton that looks like 
Fig. 3.9. The regular expression for the accepted strings can be described 
in various ways. One is (R + SU*T)*SU*. In explanation, we can go 
from the start state to itself any number of times, by following a sequence 
of paths whose labels are in either L(R) or L(SU*T). The expression 
SU*T represents paths that go to the accepting state via a path in L(S), 
perhaps return to the accepting state several times using a sequence of 
paths with labels in L(U), and then return to the start state with a path 
whose label is in L(T). Then we must go to the accepting state, never to 
return to the start state, by following a path with a label in L(S). Once 
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Figure 3.8: Result of eliminating state s from Fig. 3.7 


in the accepting state, we can return to it as many times as we like, by 
following a path whose label is in L(U). 


Start a 


Figure 3.9: A generic two-state automaton 


3. If the start state is also an accepting state, then we must also perform 
a state-elimination from the original automaton that gets rid of every 
state but the start state. When we do so, we are left with a one-state 
automaton that looks like Fig. 3.10. The regular expression denoting the 
strings that it accepts is R*. 


Start 
— 


Figure 3.10: A generic one-state automaton 


4. The desired regular expression is the sum (union) of all the expressions 
derived from the reduced automata for each accepting state, by rules (2) 
and (3). 
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0,1 


Start > 1 =o 0,1 -© 0,1 


Figure 3.11: An NFA accepting strings that have a 1 either two or three posi- 
tions from the end 


Example 3.6: Let us consider the NFA in Fig. 3.11 that accepts all strings of 
0’s and 1’s such that either the second or third position from the end has a 1. 
Our first step is to convert it to an automaton with regular expression labels. 
Since no state elimination has been performed, all we have to do is replace the 
labels “0,1” with the equivalent regular expression 0 + 1. The result is shown 
in Fig. 3.12. 


Start i 1 ~@) 0+1 -© LO 


Figure 3.12: The automaton of Fig. 3.11 with regular-expression labels 


Let us first eliminate state B. Since this state is neither accepting nor 
the start state, it will not be in any of the reduced automata. Thus, we save 
work if we eliminate it first, before developing the two reduced automata that 
correspond to the two accepting states. 

State B has one predecessor, A, and one successor, C. In terms of the 
regular expressions in the diagram of Fig. 3.7: Qı = 1, P =0+1, Ri, = 
(since the arc from A to C does not exist), and S = @ (because there is no 
loop at state B). As a result, the expression on the new arc from A to C is 
Ø+ 10*(0 +1). 

To simplify, we first eliminate the initial Ø, which may be ignored in a union. 
The expression thus becomes 19*(0 + 1). Note that the regular expression @* 
is equivalent to the regular expression €, since 


L(0*) = {e} UL(@) UL@)L@) U 


Since all the terms but the first are empty, we see that L(Ø*) = {e}, which 
is the same as L(e). Thus, 10*(0 + 1) is equivalent to 1(0 + 1), which is the 
expression we use for the arc A —> C in Fig. 3.13. 

Now, we must branch, eliminating states C and D in separate reductions. 
To eliminate state C, the mechanics are similar to those we performed above 
to eliminate state B, and the resulting automaton is shown in Fig. 3.14. 

In terms of the generic two-state automaton of Fig. 3.9, the regular expres- 
sions from Fig. 3.14 are: R = 0 + 1, S = 1(0 + 1)(0 + 1), T = , and U = Í. 
The expression U* can be replaced by e€, i.e., eliminated in a concatenation; 
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Start C) 1(0 + 1) 0+1 
© © 


Figure 3.13: Eliminating state B 


0+1 


Start C) K0 +10 +1 
© 


Figure 3.14: A two-state automaton with states A and D 


the justification is that @* = e, as we discussed above. Also, the expression 
SU*T is equivalent to 0, since T, one of the terms of the concatenation, is 0. 
The generic expression (R + SU*T)*SU* thus simplifies in this case to R*S, 
or (0+ 1)*1(0 + 1)(0 +1). In informal terms, the language of this expression 
is any string ending in 1, followed by two symbols that are each either 0 or 
1. That language is one portion of the strings accepted by the automaton of 
Fig. 3.11: those strings whose third position from the end has a 1. 

Now, we must start again at Fig. 3.13 and eliminate state D instead of C. 
Since D has no successors, an inspection of Fig. 3.7 tells us that there will be 
no changes to arcs, and the arc from C to D is eliminated, along with state D. 
The resulting two-state automaton is shown in Fig. 3.15. 

This automaton is very much like that of Fig. 3.14; only the label on the arc 
from the start state to the accepting state is different. Thus, we can apply the 
rule for two-state automata and simplify the expression to get (0+ 1)*1(0+1). 
This expression represents the other type of string the automaton accepts: those 
with a 1 in the second position from the end. 

All that remains is to sum the two expressions to get the expression for the 
entire automaton of Fig. 3.11. This expression is 


(0+ 1)*1(0+1) + (0+ 1)*1(0 + 1)(0+ 1) 


3.2.3 Converting Regular Expressions to Automata 


We shall now complete the plan of Fig. 3.1 by showing that every language L 
that is L(R) for some regular expression R, is also L(E) for some -NFA E. The 
proof is a structural induction on the expression R. We start by showing how 
to construct automata for the basis expressions: single symbols, €, and Ø. We 
then show how to combine these automata into larger automata that accept the 
union, concatenation, or closure of the language accepted by smaller automata. 

All of the automata we construct are e-NFA’s with a single accepting state. 
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Ordering the Elimination of States 


As we observed in Example 3.6, when a state is neither the start state 
nor an accepting state, it gets eliminated in all the derived automata. 
Thus, one of the advantages of the state-elimination process compared 
with the mechanical generation of regular expressions that we described 
in Section 3.2.1 is that we can start by eliminating all the states that 
are neither start nor accepting, once and for all. We only have to begin 
duplicating the reduction effort when we need to eliminate some accepting 
states. 

Even there, we can combine some of the effort. For instance, if there 
are three accepting states p, g, and r, we can eliminate p and then branch 
to eliminate either q or r, thus producing the automata for accepting states 
r and q, respectively. We then start again with all three accepting states 
and eliminate both q and r to get the automaton for p. 


0+1 


Start & 10 +1) © 


Figure 3.15: Two-state automaton resulting from the elimination of D 


Theorem 3.7: Every language defined by a regular expression is also defined 
by a finite automaton. 


PROOF: Suppose L = L(R) for a regular expression R. We show that L = L(F) 
for some e-NFA E with: 


1. Exactly one accepting state. 
2. No arcs into the initial state. 


3. No arcs out of the accepting state. 


The proof is by structural induction on R, following the recursive definition of 
regular expressions that we had in Section 3.1.2. 


BASIS: There are three parts to the basis, shown in Fig. 3.16. In part (a) we 
see how to handle the expression e. The language of the automaton is easily 
seen to be {e}, since the only path from the start state to an accepting state 
is labeled e. Part (b) shows the construction for Ø. Clearly there are no paths 
from start state to accepting state, so Ý is the language of this automaton. 
Finally, part (c) gives the automaton for a regular expression a. The language 
of this automaton evidently consists of the one string a, which is also L(a). It 
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Figure 3.16: The basis of the construction of an automaton from a regular 
expression 


is easy to check that these automata all satisfy conditions (1), (2), and (3) of 
the inductive hypothesis. 


INDUCTION: The three parts of the induction are shown in Fig. 3.17. We 
assume that the statement of the theorem is true for the immediate subexpres- 
sions of a given regular expression; that is, the languages of these subexpressions 
are also the languages of e-NFA’s with a single accepting state. The four cases 
are: 


1. The expression is R + S for some smaller expressions R and S. Then the 
automaton of Fig. 3.17(a) serves. That is, starting at the new start state, 
we can go to the start state of either the automaton for R or the automa- 
ton for S. We then reach the accepting state of one of these automata, 
following a path labeled by some string in L(R) or L(S), respectively. 
Once we reach the accepting state of the automaton for R or S, we can 
follow one of the e-arcs to the accepting state of the new automaton. 
Thus, the language of the automaton in Fig. 3.17(a) is L(R) U L(S). 


2. The expression is RS for some smaller expressions R and S. The automa- 
ton for the concatenation is shown in Fig. 3.17(b). Note that the start 
state of the first automaton becomes the start state of the whole, and the 
accepting state of the second automaton becomes the accepting state of 
the whole. The idea is that the only paths from start to accepting state go 
first through the automaton for R, where it must follow a path labeled by 
a string in L(R), and then through the automaton for S, where it follows 
a path labeled by a string in L(S). Thus, the paths in the automaton of 
Fig. 3.17(b) are all and only those labeled by strings in L(R)L(S). 


3. The expression is R* for some smaller expression R. Then we use the 
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(a) 


Figure 3.17: The inductive step in the regular-expression-to-e-NFA construction 


automaton of Fig. 3.17(c). That automaton allows us to go either: 


(a) Directly from the start state to the accepting state along a path 
labeled e. That path lets us accept €, which is in L(R*) no matter 
what expression R is. 


(b) To the start state of the automaton for R, through that automaton 
one or more times, and then to the accepting state. This set of paths 
allows us to accept strings in L(R), L(R)L(R), L(R)L(R)L(R), and 
so on, thus covering all strings in L(R*) except perhaps e, which was 
covered by the direct arc to the accepting state mentioned in (3a). 


4. The expression is (R) for some smaller expression R. The automaton 
for R also serves as the automaton for (R), since the parentheses do not 
change the language defined by the expression. 


It is a simple observation that the constructed automata satisfy the three con- 
ditions given in the inductive hypothesis — one accepting state, with no arcs 
into the initial state or out of the accepting state. 
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Figure 3.18: Automata constructed for Example 3.8 


Example 3.8: Let us convert the regular expression (0 + 1)*1(0 + 1) to an 
e-NFA. Our first step is to construct an automaton for 0 + 1. We use two 
automata constructed according to Fig. 3.16(c), one with label 0 on the arc 
and one with label 1. These two automata are then combined using the union 
construction of Fig. 3.17(a). The result is shown in Fig. 3.18(a). 


Next, we apply to Fig. 3.18(a) the star construction of Fig. 3.17(c). This 
automaton is shown in Fig. 3.18(b). The last two steps involve applying the 
concatenation construction of Fig. 3.17(b). First, we connect the automaton of 
Fig. 3.18(b) to another automaton designed to accept only the string 1. This 
automaton is another application of the basis construction of Fig. 3.16(c) with 
label 1 on the arc. Note that we must create a new automaton to recognize 1; 
we must not use the automaton for 1 that was part of Fig. 3.18(a). The third 
automaton in the concatenation is another automaton for 0+ 1. Again, we 
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must create a copy of the automaton of Fig. 3.18(a); we must not use the same 
copy that became part of Fig. 3.18(b). The complete automaton is shown in 
Fig. 3.18(c). Note that this «-NFA, when e-transitions are removed, looks just 
like the much simpler automaton of Fig. 3.15 that also accepts the strings that 
have a 1 in their next-to-last position. 


3.2.4 Exercises for Section 3.2 


Exercise 3.2.1: Here is a transition table for a DFA: 


) 


* 
v 
wn 


Give all the regular expressions RY . Note: Think of state q; as if it were 


the state with integer number i. 


(1) 


ij? 


* 
io” 
we 


Give all the regular expressions R 
much as possible. 


Try to simplify the expressions as 


O 
wa 


Give all the regular expressions Re. Try to simplify the expressions as 
much as possible. 


a 
wm 


Give a regular expression for the language of the automaton. 


* 
(a>) 
p 


Construct the transition diagram for the DFA and give a regular expres- 
sion for its language by eliminating state q2. 


Exercise 3.2.2: Repeat Exercise 3.2.1 for the following DFA: 


Note that solutions to parts (a), (b) and (e) are not available for this exercise. 


Exercise 3.2.3: Convert the following DFA to a regular expression, using the 
state-elimination technique of Section 3.2.2. 


a. 


1 
P 
8 
q 
r 


8 
q || P 
r 
q 


*] 
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Exercise 3.2.4: Convert the following regular expressions to NFA’s with e- 
transitions. 


* a) 01*. 
b) (0+ 1)01. 
c) 00(0 +1)*. 


Exercise 3.2.5: Eliminate e-transitions from your e-NFA’s of Exercise 3.2.4. 
A solution to part (a) appears in the book’s Web pages. 


Exercise 3.2.6: Let A = (Q, £, ô, qo, {qf }) be an e-NFA such that there are no 
transitions into go and no transitions out of qf. Describe the language accepted 
by each of the following modifications of A, in terms of L = L(A): 


* a) The automaton constructed from A by adding an e-transition from qy to 
qo. 


* b) The automaton constructed from A by adding an etransition from qo 
to every state reachable from qo (along a path whose labels may include 
symbols of © as well as €). 


c) The automaton constructed from A by adding an e-transition to qf from 
every state that can reach qf along some path. 


d) The automaton constructed from A by doing both (b) and (c). 


Exercise 3.2.7: There are some simplifications to the constructions of Theo- 
rem 3.7, where we converted a regular expression to an e-NFA. Here are three: 


1. For the union operator, instead of creating new start and accepting states, 
merge the two start states into one state with all the transitions of both 
start states. Likewise, merge the two accepting states, having all transi- 
tions to either go to the merged state instead. 


2. For the concatenation operator, merge the accepting state of the first 
automaton with the start state of the second. 


3. For the closure operator, simply add e-transitions from the accepting state 
to the start state and vice-versa. 


Each of these simplifications, by themselves, still yield a correct construction; 
that is, the resulting e-NFA for any regular expression accepts the language of 
the expression. Which subsets of changes (1), (2), and (3) may be made to the 
construction together, while still yielding a correct automaton for every regular 
expression? 


Exercise 3.2.8: Give an algorithm that takes a DFA A and computes the 
number of strings of length n (for some given n, not related to the number 
of states of A) accepted by A. Your algorithm should be polynomial in both 
n and the number of states of A. Hint: Use the technique suggested by the 
construction of Theorem 3.4. 
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3.3 Applications of Regular Expressions 


A regular expression that gives a “picture” of the pattern we want to recognize 
is the medium of choice for applications that search for patterns in text. The 
regular expressions are then compiled, behind the scenes, into deterministic or 
nondeterministic automata, which are then simulated to produce a program 
that recognizes patterns in text. In this section, we shall consider two impor- 
tant classes of regular-expression-based applications: lexical analyzers and text 
search. 


3.3.1 Regular Expressions in UNIX 


Before seeing the applications, we shall introduce the UNIX notation for ex- 
tended regular expressions. This notation gives us a number of additional ca- 
pabilities. In fact, the UNIX extensions include certain features, especially the 
ability to name and refer to previous strings that have matched a pattern, that 
actually allow nonregular languages to be recognized. We shall not consider 
these features here; rather we shall only introduce the shorthands that allow 
complex regular expressions to be written succinctly. 

The first enhancement to the regular-expression notation concerns the fact 
that most real applications deal with the ASCII character set. Our examples 
have typically used a small alphabet, such as {0,1}. The existence of only two 
symbols allowed us to write succinct expressions like 0+ 1 for “any character.” 
However, if there were 128 characters, say, the same expression would involve 
listing them all, and would be highly inconvenient to write. Thus, UNIX reg- 
ular expressions allow us to write character classes to represent large sets of 
characters as succinctly as possible. The rules for character classes are: 


e The symbol . (dot) stands for “any character.” 
e The sequence [a ,a2---az] stands for the regular expression 
ai +a2 +: + ap 


This notation saves about half the characters, since we don’t have to write 
the +-signs. For example, we could express the four characters used in C 
comparison operators by [<>=!]. 


e Between the square braces we can put a range of the form z-y to mean all 
the characters from x to y in the ASCII sequence. Since the digits have 
codes in order, as do the upper-case letters and the lower-case letters, we 
can express many of the classes of characters that we really care about 
with just a few keystrokes. For example, the digits can be expressed 
[0-9], the upper-case letters can be expressed [A-Z], and the set of all 
letters and digits can be expressed [A-Za-z0-9]. If we want to include a 
minus sign among a list of characters, we can place it first or last, so it is 
not confused with its use to form a character range. For example, the set 
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of digits, plus the dot, plus, and minus signs that are used to form signed 
decimal numbers may be expressed [-+.0-9]. Square brackets, or other 
characters that have special meanings in UNIX regular expressions can 
be represented as characters by preceding them with a backslash (\). 


e There are special notations for several of the most common classes of 
characters. For instance: 
a) [:digit:] is the set of ten digits, the same as [0-9].° 
b) [:alpha:] stands for any alphabetic character, as does [A-Za-z]. 


c) [:alnum:] stands for the digits and letters (alphabetic and numeric 
characters), as does [A-Za-z0-9]. 


In addition, there are several operators that are used in UNIX regular ex- 
pressions that we have not encountered previously. None of these operators 
extend what languages can be expressed, but they sometimes make it easier to 
express what we want. 


1. The operator | is used in place of + to denote union. 


2. The operator ? means “zero or one of.” Thus, R? in UNIX is the same 
as € + R in this book’s regular-expression notation. 


3. The operator + means “one or more of.” Thus, R+ in UNIX is shorthand 
for RR* in our notation. 


4. The operator {n} means “n copies of.” Thus, R{5} in UNIX is shorthand 
for RRRRR. 


Note that UNIX regular expressions allow parentheses to group subexpressions, 
just as for the regular expressions described in Section 3.1.2, and the same 
operator precedence is used (with ?, + and {n} treated like * as far as precedence 
is concerned). The star operator * is used in UNIX (without being a superscript, 
of course) with the same meaning as we have used. 


3.3.2 Lexical Analysis 


One of the oldest applications of regular expressions was in specifying the com- 
ponent of a compiler called a “lexical analyzer.” This component scans the 
source program and recognizes all tokens, those substrings of consecutive char- 
acters that belong together logically. Keywords and identifiers are common 
examples of tokens, but there are many others. 


3The notation [:digit:] has the advantage that should some code other than ASCII be 
used, including a code where the digits did not have consecutive codes, [:digit:] would still 
represent [0123456789], while [0-9] would represent whatever characters had codes between 
the codes for 0 and 9, inclusive. 
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The Complete Story for UNIX Regular Expressions 


The reader who wants to get the complete list of operators and short- 
hands available in the UNIX regular-expression notation can find them 
in the manual pages for various commands. There are some differences 
among the various versions of UNIX, but a command like man grep will 
get you the notation used for the grep command, which is fundamental. 
“Grep” stands for “Global (search for) Regular Expression and Print,” 
incidentally. 


The UNIX command lex and its GNU version flex, accept as input a list of 
regular expressions, in the UNIX style, each followed by a bracketed section of 
code that indicates what the lexical analyzer is to do when it finds an instance 
of that token. Such a facility is called a lexical-analyzer generator, because it 
takes as input a high-level description of a lexical analyzer and produces from 
it a function that is a working lexical analyzer. 

Commands such as lex and flex have been found extremely useful because 
the regular-expression notation is exactly as powerful as we need to describe 
tokens. These commands are able to use the regular-expression-to-DFA con- 
version process to generate an efficient function that breaks source programs 
into tokens. They make the implementation of a lexical analyzer an afternoon’s 
work, while before the development of these regular-expression-based tools, the 
hand-generation of the lexical analyzer could take months. Further, if we need 
to modify the lexical analyzer for any reason, it is often a simple matter to 
change a regular expression or two, instead of having to go into mysterious 
code to fix a bug. 


Example 3.9: In Fig. 3.19 is an example of partial input to the lex command, 
describing some of the tokens that are found in the language C. The first line 
handles the keyword else and the action is to return a symbolic constant (ELSE 
in this example) to the parser for further processing. The second line contains 
a regular expression describing identifiers: a letter followed by zero or more 
letters and/or digits. The action is first to enter that identifier in the symbol 
table if not already there; lex isolates the token found in a buffer, so this piece 
of code knows exactly what identifier was found. Finally, the lexical analyzer 
returns the symbolic constant ID, which has been chosen in this example to 
represent identifiers. 

The third entry in Fig. 3.19 is for the sign >=, a two-character operator. 
The last example we show is for the sign =, a one-character operator. There 
would in practice appear expressions describing each of the keywords, each of 
the signs and punctuation symbols like commas and parentheses, and families 
of constants such as numbers and strings. Many of these are very simple, 
just a sequence of one or more specific characters. However, some have more 
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else {return (ELSE) ;} 
[A-Za-z] [A-Za-z0-9] * {code to enter the found identifier 
in the symbol table; 
return (ID) ; 
} 


>= {return (GE) ;} 


= {return (ASGN) ;} 


Figure 3.19: A sample of lex input 


of the flavor of identifiers, requiring the full power of the regular-expression 
notation to describe. The integers, floating-point numbers, character strings, 
and comments are other examples of sets of strings that profit from the regular- 
expression capabilities of commands like lex. 


The conversion of a collection of expressions, such as those suggested in 
Fig. 3.19, to an automaton proceeds approximately as we have described for- 
mally in the preceding sections. We start by building an automaton for the 
union of all the expressions. This automaton in principle tells us only that 
some token has been recognized. However, if we follow the construction of The- 
orem 3.7 for the union of expressions, the e-NFA state tells us exactly which 
token has been recognized. 

The only problem is that more than one token may be recognized at once; 
for instance, the string else matches not only the regular expression else but 
also the expression for identifiers. The standard resolution is for the lexical- 
analyzer generator to give priority to the first expression listed. Thus, if we 
want keywords like else to be reserved (not usable as identifiers), we simply 
list them ahead of the expression for identifiers. 


3.3.3 Finding Patterns in Text 


In Section 2.4.1 we introduced the notion that automata could be used to search 
efficiently for a set of words in a large repository such as the Web. While the 
tools and technology for doing so are not so well developed as that for lexical 
analyzers, the regular-expression notation is valuable for describing searches 
for interesting patterns. As for lexical analyzers, the capability to go from 
the natural, descriptive regular-expression notation to an efficient (automaton- 
based) implementation offers substantial intellectual leverage. 

The general problem for which regular-expression technology has been found 
useful is the description of a vaguely defined class of patterns in text. The 
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vagueness of the description virtually guarantees that we shall not describe 
the pattern correctly at first — perhaps we can never get exactly the right 
description. By using regular-expression notation, it becomes easy to describe 
the patterns at a high level, with little effort, and to modify the description 
quickly when things go wrong. A “compiler” for regular expressions is useful 
to turn the expressions we write into executable code. 

Let us explore an extended example of the sort of problem that arises in 
many Web applications. Suppose that we want to scan a very large number of 
Web pages and detect addresses. We might simply want to create a mailing 
list. Or, perhaps we are trying to classify businesses by their location so that 
we can answer queries like “find me a restaurant within 10 minutes drive of 
where I am now.” 

We shall focus on recognizing street addresses in particular. What is a street 
address? We’ll have to figure that out, and if, while testing the software, we 
find we miss some cases, we'll have to modify the expressions to capture what 
we were missing. To begin, a street address will probably end in “Street” or its 
abbreviation, “St.” However, some people live on “Avenues” or “Roads,” and 
these might be abbreviated in the address as well. Thus, we might use as the 
ending for our regular expression something like: 


Street |St\.|Avenue|Ave\.|Road|Rd\. 


In the above expression, we have used UNIX-style notation, with the vertical 
bar, rather than +, as the union operator. Note also that the dots are escaped 
with a preceding backslash, since dot has the special meaning of “any character” 
in UNIX expressions, and in this case we really want only the period or “dot” 
character to end the three abbreviations. 

The designation such as Street must be preceded by the name of the street. 
Usually, the name is a capital letter followed by some lower-case letters. We 
can describe this pattern by the UNIX expression [A-Z] [a-z]*. However, 
some streets have a name consisting of more than one word, such as Rhode 
Island Avenue in Washington DC. Thus, after discovering that we were missing 
addresses of this form, we could revise our description of street names to be 


> [A-Z] [a-z]*( [A-Z] [a-z] *)*’ 


The expression above starts with a group consisting of a capital and zero 
or more lower-case letters. There follow zero or more groups consisting of a 
blank, another capital letter, and zero or more lower-case letters. The blank 
is an ordinary character in UNIX expressions, but to avoid having the above 
expression look like two expressions separated by a blank in a UNIX command 
line, we are required to place quotation marks around the whole expression. 
The quotes are not part of the expression itself. 

Now, we need to include the house number as part of the address. Most 
house numbers are a string of digits. However, some will have a letter follow- 
ing, as in “123A Main St.” Thus, the expression we use for numbers has an 
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optional capital letter following: [0-9]+[A-Z]?. Notice that we use the UNIX 
+ operator for “one or more” digits and the ? operator for “zero or one” capital 
letter. The entire expression we have developed for street addresses is: 


> [0-9]+[A-Z]? [A-Z] [a-z]*( [A-Z] [a-z] *)* 
(Street |St\. |Avenue|Ave\. |Road|Rd\.)’ 


If we work with this expression, we shall do fairly well. However, we shall 
eventually discover that we are missing: 


1. Streets that are called something other than a street, avenue, or road. For 
example, we shall miss “Boulevard,” “Place,” “Way,” and their abbrevi- 
ations. 


2. Street names that are numbers, or partially numbers, like “42nd Street.” 
3. Post-Office boxes and rural-delivery routes. 


4. Street names that don’t end in anything like “Street.” An example is El 
Camino Real in Silicon Valley. Being Spanish for “the royal road,” saying 
“El Camino Real Road” would be redundant, so one has to deal with 
complete addresses like “2000 El Camino Real.” 


5. All sorts of strange things we can’t even imagine. Can you? 


Thus, having a regular-expression compiler can make the process of slow con- 
vergence to the complete recognizer for addresses much easier than if we had 
to recode every change directly in a conventional programming language. 


3.3.4 Exercises for Section 3.3 


Exercise 3.3.1: Give a regular expression to describe phone numbers in all 
the various forms you can think of. Consider international numbers as well as 
the fact that different countries have different numbers of digits in area codes 
and in local phone numbers. 


Exercise 3.3.2: Give a regular expression to represent salaries as they might 
appear in employment advertising. Consider that salaries might be given on 
a per hour, week, month, or year basis. They may or may not appear with a 
dollar sign, or other unit such as “K” following. There may be a word or words 
nearby that identify a salary. Suggestion: look at classified ads in a newspaper, 
or on-line jobs listings to get an idea of what patterns might be useful. 


Exercise 3.3.3: At the end of Section 3.3.3 we gave some examples of improve- 
ments that could be possible for the regular expression that describes addresses. 
Modify the expression developed there to include all the mentioned options. 
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3.4 Algebraic Laws for Regular Expressions 


In Example 3.5, we saw the need for simplifying regular expressions, in order to 
keep the size of expressions manageable. There, we gave some ad-hoc arguments 
why one expression could be replaced by another. In all cases, the basic issue 
was that the two expressions were equivalent, in the sense that they defined 
the same languages. In this section, we shall offer a collection of algebraic 
laws that bring to a higher level the issue of when two regular expressions are 
equivalent. Instead of examining specific regular expressions, we shall consider 
pairs of regular expressions with variables as arguments. Two expressions with 
variables are equivalent if whatever languages we substitute for the variables, 
the results of the two expressions are the same language. 

An example of this process in the algebra of arithmetic is as follows. It is 
one matter to say that 1+2 = 2+1. That is an example of the commutative law 
of addition, and it is easy to check by applying the addition operator on both 
sides and getting 3 = 3. However, the commutative law of addition says more; 
it says that x +y = y + x, where x and y are variables that can be replaced 
by any two numbers. That is, no matter what two numbers we add, we get the 
same result regardless of the order in which we sum them. 

Like arithmetic expressions, the regular expressions have a number of laws 
that work for them. Many of these are similar to the laws for arithmetic, if we 
think of union as addition and concatenation as multiplication. However, there 
are a few places where the analogy breaks down, and there are also some laws 
that apply to regular expressions but have no analog for arithmetic, especially 
when the closure operator is involved. The next sections form a catalog of the 
major laws. We conclude with a discussion of how one can check whether a 
proposed law for regular expressions is indeed a law; i.e., it will hold for any 
languages that we may substitute for the variables. 


3.4.1 Associativity and Commutativity 


Commutativity is the property of an operator that says we can switch the order 
of its operands and get the same result. An example for arithmetic was given 
above: x +y =y +x. Associativity is the property of an operator that allows 
us to regroup the operands when the operator is applied twice. For example, 
the associative law of multiplication is (a x y) x z = x x (y x z). Here are three 
laws of these types that hold for regular expressions: 


e L+M =M +L. This law, the commutative law for union, says that we 
may take the union of two languages in either order. 


e (L+M)+N=L+4+(M+N). This law, the associative law for union, 
says that we may take the union of three languages either by taking the 
union of the first two initially, or taking the union of the last two initially. 
Note that, together with the commutative law for union, we conclude 
that we can take the union of any collection of languages with any order 
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and grouping, and the result will be the same. Intuitively, a string is in 
Lı U Lə U --- U Lp if and only if it is in one or more of the L,’s. 


e (LM)N = L(MN). This law, the associative law for concatenation, says 
that we can concatenate three languages by concatenating either the first 
two or the last two initially. 


Missing from this list is the “law” LM = ML, which would say that con- 
catenation is commutative. However, this law is false. 


Example 3.10: Consider the regular expressions 01 and 10. These expres- 
sions denote the languages {01} and {10}, respectively. Since the languages are 
different the general law LM = ML cannot hold. If it did, we could substitute 
the regular expression 0 for L and 1 for M and conclude falsely that 01 = 10. 


3.4.2 Identities and Annihilators 


An identity for an operator is a value such that when the operator is applied to 
the identity and some other value, the result is the other value. For instance, 
0 is the identity for addition, since 0+ £z = x +0 =a, and 1 is the identity 
for multiplication, since 1 x x = x x 1 = x. An annihilator for an operator 
is a value such that when the operator is applied to the annihilator and some 
other value, the result is the annihilator. For instance, 0 is an annihilator for 
multiplication, since 0 x x = « x 0 =0. There is no annihilator for addition. 

There are three laws for regular expressions involving these concepts; we list 
them below. 


e§+L=L+0=L. This law asserts that É is the identity for union. 
e cL = Le = L. This law asserts that € is the identity for concatenation. 
e ØL = L =9. This law asserts that Ø is the annihilator for concatenation. 


These laws are powerful tools in simplifications. For example, if we have a 
union of several expressions, some of which are, or have been simplified to @, 
then the @’s can be dropped from the union. Likewise, if we have a concatenation 
of several expressions, some of which are, or have been simplified to €, we can 
drop the e’s from the concatenation. Finally, if we have a concatenation of any 
number of expressions, and even one of them is 9, then the entire concatenation 
can be replaced by @. 


3.4.3 Distributive Laws 


A distributive law involves two operators, and asserts that one operator can be 
pushed down to be applied to each argument of the other operator individually. 
The most common example from arithmetic is the distributive law of multipli- 
cation over addition, that is, x x (y+z) =x x y+gx x z. Since multiplication is 
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commutative, it doesn’t matter whether the multiplication is on the left or right 
of the sum. However, there is an analogous law for regular expressions, that we 
must state in two forms, since concatenation is not commutative. These laws 
are: 


e L(M+N)=LIM+LN. This law, is the left distributive law of concate- 
nation over union. 


e (M+N)L= ML+NL. This law, is the right distributive law of con- 
catenation over union. 


Let us prove the left distributive law; the other is proved similarly. The 
proof will refer to languages only; it does not depend on the languages having 
regular expressions. 


Theorem 3.11: If L, M, and N are any languages, then 
L(MUN)=LMULN 


PROOF: The proof is similar to another proof about a distributive law that we 
saw in Theorem 1.10. We need first to show that a string w is in L(M U N) if 
and only if it isin LM U LN. 


(Only-if) If w is in L(M U N), then w = xy, where x is in L and y is in either 
M or N. If y isin M, then zy is in LM, and therefore in LM U LN. Likewise, 
if y is in N, then zy isin LN and therefore in LM U LN. 


(If) Suppose w is in LM U LN. Then w is in either LM or in LN. Suppose 
first that w isin LM. Then w = xy, where x is in L and y is in M. As y isin 
M, it is also in M U N. Thus, zy is in L(M U N). If w is not in LM, then it 
is surely in LN, and a similar argument shows it is in L(M U N). 


Example 3.12: Consider the regular expression 0+01*. We can “factor out a 
0” from the union, but first we have to recognize that the expression 0 by itself 
is actually the concatenation of 0 with something, namely e. That is, we use 
the identity law for concatenation to replace 0 by Oe, giving us the expression 
Oc +01*. Now, we can apply the left distributive law to replace this expression 
by O(c + 1*). If we further recognize that € is in L(1*), then we observe that 
€+1* = 1*, and can simplify to 01”. 


3.4.4 The Idempotent Law 


An operator is said to be idempotent if the result of applying it to two of the 
same values as arguments is that value. The common arithmetic operators are 
not idempotent; x + x Æ x in general and x x x # x in general (although there 
are some values of x for which the equality holds, such as 0 + 0 = 0). However, 
union and intersection are common examples of idempotent operators. Thus, 
for regular expressions, we may assert the following law: 
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e L+L=TL. This law, the idempotence law for union, states that if we 
take the union of two identical expressions, we can replace them by one 
copy of the expression. 


3.4.5 Laws Involving Closures 


There are a number of laws involving the closure operators and its UNIX-style 
variants + and ?. We shall list them here, and give some explanation for why 
they are true. 


e (L*)* = L*. This law says that closing an expression that is already 
closed does not change the language. The language of (L*)* is all strings 
created by concatenating strings in the language of L*. But those strings 
are themselves composed of strings from L. Thus, the string in (L*)* is 
also a concatenation of strings from L and is therefore in the language of 
L*. 


e (* = e. The closure of @ contains only the string e, as we discussed in 
Example 3.6. 


* 


e «* =e. It is easy to check that the only string that can be formed by 
concatenating any number of copies of the empty string is the empty 
string itself. 


e Lt = LL* = L*L. Recall that Lt is defined tobe L+ LL+LLIL+::-. 
Also, L* =e+L+0L4+ LLL +.---. Thus, 


LL* = Le+ LL+ LLL+LLLL+.-- 


When we remember that Le = L, we see that the infinite expansions for 
LL* and for L* are the same. That proves Lt = LL*. The proof that 
L+ = L* L is similar.* 


e L* = Lt +e. The proof is easy, since the expansion of Lt includes every 
term in the expansion of L* except e. Note that if the language L contains 
the string e€, then the additional “+e” term is not needed; that is, Lt = L* 
in this special case. 


e L? = e+ L. This rule is really the definition of the ? operator. 


3.4.6 Discovering Laws for Regular Expressions 


Each of the laws above was proved, formally or informally. However, there is 
an infinite variety of laws about regular expressions that might be proposed. 
Is there a general methodology that will make our proofs of the correct laws 


4Notice that, as a consequence, any language L commutes (under concatenation) with its 
own closure; LL* = L* L. That rule does not contradict the fact that, in general, concatena- 
tion is not commutative. 
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easy? It turns out that the truth of a law reduces to a question of the equality 
of two specific languages. Interestingly, the technique is closely tied to the 
regular-expression operators, and cannot be extended to expressions involving 
some other operators, such as intersection. 

To see how this test works, let us consider a proposed law, such as 


(L+ M)* = (L*M*)* 


This law says that if we have any two languages L and M, and we close their 
union, we get the same language as if we take the language L*M*, that is, 
all strings composed of zero or more choices from L followed by zero or more 
choices from M, and close that language. 

To prove this law, suppose first that string w is in the language of (L+M)*.° 
Then we can write w = w,w2--: wz for some k, where each w; is in either L or 
M. It follows that each w; is in the language of L*M*. To see why, if w; is in 
L, pick one string, wi, from L; this string is also in L*. Pick no strings from 
M; that is, pick € from M*. If w; is in M, the argument is similar. Once every 
w; is seen to be in L* M*, it follows that w is in the closure of this language. 

To complete the proof, we also have to prove the converse: that strings 
in (L* M*)* are also in (L + M)*. We omit this part of the proof, since our 
objective is not to prove the law, but to notice the following important property 
of regular expressions. 

Any regular expression with variables can be thought of as a concrete regular 
expression, one that has no variables, by thinking of each variable as if it were a 
distinct symbol. For example, the expression (L+ M)* can have variables L and 
M replaced by symbols a and b, respectively, giving us the regular expression 
(a+b)*. 

The language of the concrete expression guides us regarding the form of 
strings in any language that is formed from the original expression when we 
replace the variables by languages. Thus, in our analysis of (L + M)*, we 
observed that any string w composed of a sequence of choices from either L or 
M, would be in the language of (L + M)*. We can arrive at that conclusion 
by looking at the language of the concrete expression, L((a + b)*), which is 
evidently the set of all strings of a’s and b’s. We could substitute any string in 
L for any occurrence of a in one of those strings, and we could substitute any 
string in M for any occurrence of b, with possibly different choices of strings for 
different occurrences of a or b. Those substitutions, applied to all the strings 
in (a+b)*, gives us all strings formed by concatenating strings from L and/or 
M, in any order. 

The above statement may seem obvious, but as is pointed out in the box 
on “Extensions of the Test Beyond Regular Expressions May Fail,” it is not 
even true when some other operators are added to the three regular-expression 
operators. We prove the general principle for regular expressions in the next 
theorem. 


5For simplicity, we shall identify the regular expressions and their languages, and avoid 
saying “the language of” in front of every regular expression. 
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Theorem 3.13: Let E be a regular expression with variables L1, L2,..., Lm. 
Form concrete regular expression C by replacing each occurrence of L; by the 
symbol a,;, for i = 1,2,...,m. Then for any languages L1, Lə,..., Lm, every 


string w in L(E) can be written w = wiwe---w,, where each w; is in one of 
the languages, say L;,, and the string a;,a;,--+a@;, is in the language L(C). 
Less formally, we can construct L(F) by starting with each string in L(C), 
Say Gj, @j.°*'a;,, and substituting for each of the a,,’s any string from the 
corresponding language Lj,. 


PROOF: The proof is a structural induction on the expression E. 


BASIS: The basis cases are where E is e, 9, or a variable L. In the first two 
cases, there is nothing to prove, since the concrete expression Č is the same as 
E. If E is a variable L, then L(E) = L. The concrete expression Č is just a, 
where a is the symbol corresponding to L. Thus, L(C) = {a}. If we substitute 
any string in L for the symbol a in this one string, we get the language L, which 
is also L(E). 


INDUCTION: There are three cases, depending on the final operator of E. 
First, suppose that E = F + G; i.e., a union is the final operator. Let C and D 
be the concrete expressions formed from F and G, respectively, by substituting 
concrete symbols for the language-variables in these expressions. Note that the 
same symbol must be substituted for all occurrences of the same variable, in 
both F and G. Then the concrete expression that we get from E is C + D, and 
L(C + D) = L(C) + L(D). 

Suppose that w is a string in L(E), when the language variables of E are 
replaced by specific languages. Then w is in either L(F) or L(G). By the 
inductive hypothesis, w is obtained by starting with a concrete string in L(C) or 
L(D), respectively, and substituting for the symbols strings in the corresponding 
languages. Thus, in either case, the string w can be constructed by starting 
with a concrete string in L(C+D), and making the same substitutions of strings 
for symbols. 

We must also consider the cases where E is FG or F*. However, the ar- 
guments are similar to the union case above, and we leave them for you to 
complete. 


3.4.7 The Test for a Regular-Expression Algebraic Law 


Now, we can state and prove the test for whether or not a law of regular 
expressions is true. The test for whether E = F is true, where E and F are 
two regular expressions with the same set of variables, is: 


1. Convert E and F to concrete regular expressions C and D, respectively, 
by replacing each variable by a concrete symbol. 


2. Test whether L(C) = L(D). If so, then E = F is a true law, and if not, 
then the “law” is false. Note that we shall not see the test for whether two 


3.4. ALGEBRAIC LAWS FOR REGULAR EXPRESSIONS 121 


regular expressions denote the same language until Section 4.4. However, 
we can use ad-hoc means to decide the equality of the pairs of languages 
that we actually care about. Recall that if the languages are not the same, 
then it is sufficient to provide one counterexample: a single string that is 
in one language but not the other. 


Theorem 3.14: The above test correctly identifies the true laws for regular 
expressions. 


PROOF: We shall show that L(E) = L(F) for any languages in place of the 
variables of E and F if and only if L(C) = L(D). 


(Only-if) Suppose L(E) = L(F) for all choices of languages for the variables. 
In particular, choose for every variable L the concrete symbol a that replaces L 
in expressions C and D. Then for this choice, L(C) = L(F), and L(D) = L(F). 
Since L(E) = L(F) is given, it follows that L(C) = L(D). 


(If) Suppose L(C) = L(D). By Theorem 3.13, L(E) and L(F) are each 
constructed by replacing the concrete symbols of strings in L(C) and L(D), 
respectively, by strings in the languages that correspond to those symbols. If 
the strings of L(C’) and L(D) are the same, then the two languages constructed 
in this manner will also be the same; that is, L(E) = L(F). 


Example 3.15: Consider the prospective law (L + M)* = (L*M*)*. If we 
replace variables L and M by concrete symbols a and b respectively, we get the 
regular expressions (a + b)* and (a*b*)*. It is easy to check that both these 
expressions denote the language with all strings of a’s and b’s. Thus, the two 
concrete expressions denote the same language, and the law holds. 

For another example of a law, consider L* = L* L*. The concrete languages 
are a* and a*a‘*, respectively, and each of these is the set of all strings of a’s. 
Again, the law is found to hold; that is, concatenation of a closed language with 
itself yields that language. 

Finally, consider the prospective law L+ ML = (L + M)L. If we choose 
symbols a and b for variables L and M, respectively, we have the two concrete 
regular expressions a + ba and (a + b)a. However, the languages of these 
expressions are not the same. For example, the string aa is in the second, but 
not the first. Thus, the prospective law is false. 


3.4.8 Exercises for Section 3.4 


Exercise 3.4.1: Verify the following identities involving regular expressions. 
*a) R+S=S+R. 

b) (R+S)+T=R+(S+T). 

c) (RS)T = R(ST). 
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Extensions of the Test Beyond Regular Expressions 
May Fail 


Let us consider an extended regular-expression algebra that includes 
the intersection operator. Interestingly, adding N to the three regular- 
expression operators does not increase the set of languages we can de- 
scribe, as we shall see in Theorem 4.8. However, it does make the test for 
algebraic laws invalid. 

Consider the “law” LA M N N = LAN M; that is, the intersection of 
any three languages is the same as the intersection of the first two of these 
languages. This “law” is patently false. For example, let L = M = {a} 
and N = 9. But the test based on concretizing the variables would fail to 
see the difference. That is, if we replaced L, M, and N by the symbols a, 
b, and c, respectively, we would test whether {a} N {b} N {c} = {a} N {b}. 
Since both sides are the empty set, the equality of languages holds and 
the test would imply that the “law” is true. 


d) R(S+T)=RS + RT. 
e) (R+S)T = RT + ST. 
* f) (R*)* = 
g) (e + R)* = R*. 

h) (R*S*)* =(R+S)*. 

! Exercise 3.4.2: Prove or disprove each of the following statements about 
regular expressions. 


*a) (R+S)* = R+S. 
b) (RS + R)*R = R(SR + R)*. 

* c) (RS + R)*RS = (RR*S)*. 
d) (R+8)*S = (R*S)*. 
e) S(RS + S)*R = RR*S(RR*S)*. 


Exercise 3.4.3: In Example 3.6, we developed the regular expression 
(0+ 1)*1(0 +1) + (0+ 1)*1(0+ 1)(0+ 1) 


Use the distributive laws to develop two different, simpler, equivalent expres- 
sions. 
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Exercise 3.4.4: At the beginning of Section 3.4.6, we gave part of a proof that 
(L* M*)* = (L+ M)*. Complete the proof by showing that strings in (L* M*)* 
are also in (L + M)*. 


! Exercise 3.4.5: Complete the proof of Theorem 3.13 by handling the cases 
where regular expression E is of the form FG or of the form F*. 


3.5 


+ 


3.6 


Summary of Chapter 3 


Regular Expressions: This algebraic notation describes exactly the same 
languages as finite automata: the regular languages. The regular-ex- 
pression operators are union, concatenation (or “dot”), and closure (or 
“star” ). 


Regular Expressions in Practice: Systems such as UNIX and various of 
its commands use an extended regular-expression language that provides 
shorthands for many common expressions. Character classes allow the 
easy expression of sets of symbols, while operators such as one-or-more-of 
and at-most-one-of augment the usual regular-expression operators. 


Equivalence of Regular Expressions and Finite Automata: We can con- 
vert a DFA to a regular expression by an inductive construction in which 
expressions for the labels of paths allowed to pass through increasingly 
larger sets of states are constructed. Alternatively, we can use a state- 
elimination procedure to build the regular expression for a DFA. In the 
other direction, we can construct recursively an e-NFA from regular ex- 
pressions, and then convert the e-NFA to a DFA, if we wish. 


The Algebra of Regular Expressions: Regular expressions obey many of 
the algebraic laws of arithmetic, although there are differences. Union 
and concatenation are associative, but only union is commutative. Con- 
catenation distributes over union. Union is idempotent. 


Testing Algebraic Identities: We can tell whether a regular-expression 
equivalence involving variables as arguments is true by replacing the vari- 
ables by distinct constants and testing whether the resulting languages 
are the same. 


Gradiance Problems for Chapter 3 


The following is a sample of problems that are available on-line through the 
Gradiance system at www.gradiance.com/pearson. Each of these problems 
is worked like conventional homework. The Gradiance system gives you four 
choices that sample your knowledge of the solution. If you make the wrong 
choice, you are given a hint or advice and encouraged to try the same problem 
again. 
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Problem 3.1: Here is a finite automaton [shown on-line by the Gradiance 
system]. Which of the following regular expressions defines the same language 
as the finite automaton? Hint: each of the correct choices uses component 
expressions. Some of these components are: 


1. The ways to get from A to D without going through D. 
2. The ways to get from D to itself, without going through D. 
3. The ways to get from A to itself, without going through A. 


It helps to write down these expressions first, and then look for an expression 
that defines all the paths from A to D. 


Problem 3.2: When we convert an automaton to a regular expression, we 
need to build expressions for the labels along paths from one state to another 
state that do not go through certain other states. Below is a nondeterministic 
finite automaton with three states [shown on-line by the Gradiance system]. 
For each of the six orders of the three states, find regular expressions that give 
the set of labels along all paths from the first state to the second state that 
never go through the third state. Then identify one of these expressions from 
the list of choices below. 


Problem 3.3: Identify from the list below the regular expression that gener- 
ates all and only the strings over alphabet {0,1} that end in 1. 


Problem 3.4: Apply the construction in Fig. 3.16 and Fig. 3.17 to convert 
the regular expression (0 + 1)*(0 +€) to an epsilon-NFA. Then, identify the true 
statement about your epsilon-NFA from the list below. 


Problem 3.5: Consider the following identities for regular expressions; some 
are false and some are true. You are asked to decide which and in case it is 
false to provide the correct counterexample. 


a) R(S+T) = RS+RT 

b) (R*)* = R* 

c) (R*S*)* = (R+ S) 

d) (R+ S)* = R*+ S* 

e) S(RS + S)*R = RR*S(RR*S)* 
f) (RS + R)*R= R(SR + R)* 


Problem 3.6: In this question you are asked to consider the truth or falsehood 
of six equivalences for regular expressions. If the equivalence is true, you must 
also identify the law from which it follows. In each case the statement R = S is 
conventional shorthand for “Z(R) = L(S).” The six proposed equivalences are: 
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1. 0*1* = 1*0* 

. 010 =90 

. €01 = 01 

. (0* + 1*)0 = 0*0 + 1*0 
. (0*1)0* = 0*(10*) 


QO oa Aà ®© N 


. 01+01=01 
Identify the correct statement from the list below. 


Problem 3.7: Which of the following strings is not in the Kleene closure of 
the language {011, 10, 110}? 


Problem 3.8: Here are seven regular expressions [shown on-line by the Gra- 
diance system]. Determine the language of each of these expressions. Then, 
find in the list below a pair of equivalent expressions. 


Problem 3.9: Converting a DFA such as the following [shown on-line by 
the Gradiance system]. to a regular expression requires us to develop regular 
expressions for limited sets of paths — those that take the automaton from one 
particular state to another particular state, without passing through some set 
of states. For the automaton above, determine the languages for the following 
limitations: 


1. Laa = the set of path labels that go from A to A without passing through 
C or D. 


2. Lap = the set of path labels that go from A to B without passing through 
C or D. 


3. Lpa = the set of path labels that go from B to A without passing through 
C or D. 


4. Lpp = the set of path labels that go from B to B without passing through 
C or D. 


Then, identify a correct regular expression from the list below. 


3.7 References for Chapter 3 


The idea of regular expressions and the proof of their equivalence to finite 
automata is the work of S. C. Kleene [3]. However, the construction of an e- 
NFA from a regular expression, as presented here, is the “McNaughton- Yamada 
construction,” from [4]. The test for regular-expression identities by treating 
variables as constants was written down by J. Gischer [2]. Although thought to 
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be folklore, this report demonstrated how adding several other operations such 
as intersection or shuffle (See Exercise 7.3.4) makes the test fail, even though 
they do not extend the class of languages representable. 

Even before developing UNIX, K. Thompson was investigating the use of 
regular expressions in commands such as grep, and his algorithm for processing 
such commands appears in [5]. The early development of UNIX produced sev- 
eral other commands that make heavy use of the extended regular-expression 
notation, such as M. Lesk’s lex command. A description of this command and 
other regular-expression techniques can be found in [1]. 


1. A. V. Aho, R. Sethi, and J. D. Ullman, Compilers: Principles, Techniques, 
and Tools, Addison-Wesley, Reading MA, 1986. 


2. J. L. Gischer, STAN-CS-TR-84-1033 (1984). 


3. S.C. Kleene, “Representation of events in nerve nets and finite automata,” 
In C. E. Shannon and J. McCarthy, Automata Studies, Princeton Univ. 
Press, 1956, pp. 3-42. 


4. R. McNaughton and H. Yamada, “Regular expressions and state graphs 
for automata,” IEEE Trans. Electronic Computers 9:1 (Jan., 1960), pp. 
39-47. 


5. K. Thompson, “Regular expression search algorithm,” Comm. ACM 11:6 
(June, 1968), pp. 419-422. 


Chapter 4 


Properties of Regular 
Languages 


The chapter explores the properties of regular languages. Our first tool for 
this exploration is a way to prove that certain languages are not regular. This 
theorem, called the “pumping lemma,” is introduced in Section 4.1. 


One important kind of fact about the regular languages is called a “closure 
property.” These properties let us build recognizers for languages that are 
constructed from other languages by certain operations. As an example, the 
intersection of two regular languages is also regular. Thus, given automata 
that recognize two different regular languages, we can construct mechanically 
an automaton that recognizes exactly the intersection of these two languages. 
Since the automaton for the intersection may have many more states than either 
of the two given automata, this “closure property” can be a useful tool for 
building complex automata. Section 2.1 used this construction in an essential 
way. 


Some other important facts about regular languages are called “decision 
properties.” Our study of these properties gives us algorithms for answering 
important questions about automata. A central example is an algorithm for 
deciding whether two automata define the same language. A consequence of 
our ability to decide this question is that we can “minimize” automata, that 
is, find an equivalent to a given automaton that has as few states as possible. 
This problem has been important in the design of switching circuits for decades, 
since the cost of the circuit (area of a chip that the circuit occupies) tends to 
decrease as the number of states of the automaton implemented by the circuit 
decreases. 
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4.1 Proving Languages Not to Be Regular 


We have established that the class of languages known as the regular languages 
has at least four different descriptions. They are the languages accepted by 
DFA’s, by NFA’s, and by e-NFA’s; they are also the languages defined by regular 
expressions. 

Not every language is a regular language. In this section, we shall introduce 
a powerful technique, known as the “pumping lemma,” for showing certain 
languages not to be regular. We then give several examples of nonregular 
languages. In Section 4.2 we shall see how the pumping lemma can be used in 
tandem with closure properties of the regular languages to prove other languages 
not to be regular. 


4.1.1 The Pumping Lemma for Regular Languages 


Let us consider the language Loy = {0"1" | n > 1}. This language contains 
all strings 01, 0011, 000111, and so on, that consist of one or more 0’s followed 
by an equal number of 1’s. We claim that Lo, is not a regular language. The 
intuitive argument is that if Do, were regular, then Lo; would be the language 
of some DFA A. This automaton has some particular number of states, say k 
states. Imagine this automaton receiving k 0’s as input. It is in some state after 
consuming each of the k + 1 prefixes of the input: €,0,00,...,0*. Since there 
are only k different states, the pigeonhole principle tells us that after reading 
two different prefixes, say 0f and 0’, A must be in the same state, say state q. 

However, suppose instead that after reading i or j 0’s, the automaton A 
starts receiving 1’s as input. After receiving i 1’s, it must accept if it previously 
received 7 0’s, but not if it received 7 0’s. Since it was in state q when the 1’s 
started, it cannot “remember” whether it received ¿i or j 0’s, so we can “fool” 
A and make it do the wrong thing — accept if it should not, or fail to accept 
when it should. 

The above argument is informal, but can be made precise. However, the 
same conclusion, that the language [1 is not regular, can be reached using a 
general result, as follows. 


Theorem 4.1: (The pumping lemma for regular languages) Let L be a regular 
language. Then there exists a constant n (which depends on L) such that for 
every string w in L such that |w| > n, we can break w into three strings, 
w = xyz, such that: 


l. y Że. 
2. |ey| <n. 
3. For all k > 0, the string zy*z is also in L. 


That is, we can always find a nonempty string y not too far from the beginning 
of w that can be “pumped”; that is, repeating y any number of times, or deleting 
it (the case k = 0), keeps the resulting string in the language L. 
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PROOF: Suppose L is regular. Then L = L(A) for some DFA A. Suppose A has 
n states. Now, consider any string w of length n or more, say w = a, @2:-+ Gm, 
where m > n and each a; is an input symbol. For i = 0,1,...,n define state 
pi to be 5(qo, a1a+-+a;), where 6 is the transition function of A, and qo is the 
start state of A. That is, p; is the state A is in after reading the first i symbols 
of w. Note that po = qo. 

By the pigeonhole principle, it is not possible for the n + 1 different p;’s for 
i = 0,1,...,n to be distinct, since there are only n different states. Thus, we 
can find two different integers i and j, with 0 <i < j < n, such that p; = pj. 
Now, we can break w = xyz as follows: 


1. T£ = A102 ''' Qi. 


2. Y = Gj414j42°'' aj. 


a= Qj4+14j42°°'Am- 


That is, x takes us to p; once; y takes us from p; back to p; (since p; is also pj), 
and z is the balance of w. The relationships among the strings and states are 
suggested by Fig. 4.1. Note that x may be empty, in the case that i = 0. Also, 
z may be empty if j = n =m. However, y can not be empty, since i is strictly 
less than 7. 


Figure 4.1: Every string longer than the number of states must cause a state 
to repeat 


Now, consider what happens if the automaton A receives the input zy" z for 
any k > 0. If k =0, then the automaton goes from the start state qo (which is 
also po) to p; on input x. Since p; is also p;, it must be that A goes from p; to 
the accepting state shown in Fig. 4.1 on input z. Thus, A accepts xz. 

If k > 0, then A goes from qo to p; on input x, circles from p; to p; k times 
on input y*, and then goes to the accepting state on input z. Thus, for any 
k > 0, ay*z is also accepted by A; that is, cy*z is in L. 


4.1.2 Applications of the Pumping Lemma 


Let us see some examples of how the pumping lemma is used. In each case, 
we shall propose a language and use the pumping lemma to prove that the 
language is not regular. 
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The Pumping Lemma as an Adversarial Game 


Recall our discussion from Section 1.2.3 where we pointed out that a theo- 
rem whose statement involves several alternations of “for-all” and “there- 
exists” quantifiers can be thought of as a game between two players. The 
pumping lemma is an important example of this type of theorem, since it 
in effect involves four different quantifiers: “for all regular languages L 
there exists n such that for all w in L with |w| > n there exists zyz 
equal to w such that --- .” We can see the application of the pumping 
lemma as a game, in which: 


. Player 1 picks the language L to be proved nonregular. 


. Player 2 picks n, but doesn’t reveal to player 1 what n is; player 1 
must devise a play for all possible n’s. 


. Player 1 picks w, which may depend on n and which must be of 
length at least n. 


. Player 2 divides w into z, y, and z, obeying the constraints that 
are stipulated in the pumping lemma; y £ € and |ry| < n. Again, 
player 2 does not have to tell player 1 what x, y, and z are, although 
they must obey the constraints. 


. Player 1 “wins” by picking k, which may be a function of n, x, y, 
and z, such that «y*z is not in L. 


Example 4.2: Let us show that the language Leg consisting of all strings with 
an equal number of 0’s and 1’s (not in any particular order) is not a regular 
language. In terms of the “two-player game” described in the box on “The 
Pumping Lemma as an Adversarial Game,” we shall be player 1 and we must 
deal with whatever choices player 2 makes. Suppose n is the constant that must 
exist if Leg is regular, according to the pumping lemma; i.e., “player 2” picks 
n. We shall pick w = 0"1”, that is, n 0’s followed by n 1’s, a string that surely 
is in Leg. 

Now, “player 2” breaks our w up into xyz. All we know is that y Æ €, and 
|xy| < n. However, that information is very useful, and we “win” as follows. 
Since |zy| < n, and zy comes at the front of w, we know that x and y consist 
only of 0’s. The pumping lemma tells us that xz is in Leg, if Leg is regular. 
This conclusion is the case k = 0 in the pumping lemma.! However, xz has n 
1’s, since all the 1’s of w are in z. But xz also has fewer than n 0’s, because we 


lObserve in what follows that we could have also succeeded by picking k = 2, or indeed 
any value of k other than 1. 
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lost the 0’s of y. Since y 4 € we know that there can be no more than n — 1 0’s 
among x and z. Thus, after assuming Le, is a regular language, we have proved 
a fact known to be false, that xz is in Leg. We have a proof by contradiction 
of the fact that Le, is not regular. 


Example 4.3: Let us show that the language Lp, consisting of all strings of 
1’s whose length is a prime is not a regular language. Suppose it were. Then 
there would be a constant n satisfying the conditions of the pumping lemma. 
Consider some prime p > n + 2; there must be such a p, since there are an 
infinity of primes. Let w = 1?. 

By the pumping lemma, we can break w = xyz such that y 4 € and |xy| < n. 
Let |y| =m. Then |zz| = p—m. Now consider the string xy?~™z, which must 
be in Lpr by the pumping lemma, if Lpr really is regular. However, 


Jey?™2| = |z2| + (p— m)lyl =p -m + (p—m)m = (m + 1)(p - m) 


It looks like |zy?™™z| is not a prime, since it has two factors m + 1 and 
p— m. However, we must check that neither of these factors are 1, since then 
(m +1)(p — m) might be a prime after all. But m + 1 > 1, since y Æ € tells us 
m > 1. Also, p—m > 1, since p > n +2 was chosen, and m < n since 


m = |y| < |zy| <n 


Thus, p- m > 2. 

Again we have started by assuming the language in question was regular, 
and we derived a contradiction by showing that some string not in the language 
was required by the pumping lemma to be in the language. Thus, we conclude 
that Lpr is not a regular language. 


4.1.3 Exercises for Section 4.1 
Exercise 4.1.1: Prove that the following are not regular languages. 


a) {0”1” | n > 1}. This language, consisting of a string of 0’s followed by an 
equal-length string of 1’s, is the language Lo, we considered informally at 
the beginning of the section. Here, you should apply the pumping lemma 
in the proof. 


b) The set of strings of balanced parentheses. These are the strings of char- 
acters “(” and “)” that can appear in a well-formed arithmetic expression. 


* c) {0"10" |n > 1}. 
d) {0"1™2” | n and m are arbitrary integers}. 
e) {0"1" |n < m}. 
) 


f) {0"12" |n > 1}. 
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! Exercise 4.1.2: Prove that the following are not regular languages. 
* a) {0” | n is a perfect square}. 
b) {0” | n is a perfect cube}. 
c) {0” | n is a power of 2}. 
d) The set of strings of 0’s and 1’s whose length is a perfect square. 


e) The set of strings of 0’s and 1’s that are of the form ww, that is, some 
string repeated. 


The set of strings of 0’s and 1’s that are of the form ww”, that is, some 
string followed by its reverse. (See Section 4.2.2 for a formal definition of 
the reversal of a string.) 


mh 
xor 


The set of strings of 0’s and 1’s of the form ww, where W is formed from 
w by replacing all 0’s by 1’s, and vice-versa; e.g., 011 = 100, and 011100 
is an example of a string in the language. 


(oje 
a 


pe 
Nae 


The set of strings of the form w1”, where w is a string of 0’s and 1’s of 
length n. 


!! Exercise 4.1.3: Prove that the following are not regular languages. 


a) The set of strings of 0’s and 1’s, beginning with a 1, such that when 
interpreted as an integer, that integer is a prime. 


b) The set of strings of the form 0'1/ such that the greatest common divisor 
of i and j is 1. 


! Exercise 4.1.4: When we try to apply the pumping lemma to a regular lan- 
guage, the “adversary wins,” and we cannot complete the proof. Show what 
goes wrong when we choose L to be one of the following languages: 

* a) The empty set. 
* b) {00, 11}. 


* c) (00+ 11)*. 


d) 01*0*1. 
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4.2 Closure Properties of Regular Languages 


In this section, we shall prove several theorems of the form “if certain languages 
are regular, and a language L is formed from them by certain operations (e.g., L 
is the union of two regular languages), then L is also regular.” These theorems 
are often called closure properties of the regular languages, since they show that 
the class of regular languages is closed under the operation mentioned. Closure 
properties express the idea that when one (or several) languages are regular, 
then certain related languages are also regular. They also serve as an interest- 
ing illustration of how the equivalent representations of the regular languages 
(automata and regular expressions) reinforce each other in our understanding 
of the class of languages, since often one representation is far better than the 
others in supporting a proof of a closure property. Here is a summary of the 
principal closure properties for regular languages: 


1. The union of two regular languages is regular. 
. The intersection of two regular languages is regular. 
. The complement of a regular language is regular. 


. The difference of two regular languages is regular. 


2 

3 

4 

5. The reversal of a regular language is regular. 

6. The closure (star) of a regular language is regular. 
7. The concatenation of regular languages is regular. 
8 


. A homomorphism (substitution of strings for symbols) of a regular lan- 
guage is regular. 


9. The inverse homomorphism of a regular language is regular. 


4.2.1 Closure of Regular Languages Under Boolean 
Operations 


Our first closure properties are the three boolean operations: union, intersec- 
tion, and complementation: 


1. Let L and M be languages over alphabet ©. Then L U M is the language 
that contains all strings that are in either or both of L and M. 


2. Let L and M be languages over alphabet ©. Then LM M is the language 
that contains all strings that are in both L and M. 


3. Let L be a language over alphabet ©. Then L, the complement of L, is 
the set of strings in }* that are not in L. 


It turns out that the regular languages are closed under all three of the 
boolean operations. The proofs take rather different approaches though, as we 
shall see. 
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What if Languages Have Different Alphabets? 


When we take the union or intersection of two languages L and M, they 
might have different alphabets. For example, it is possible that Lı C 
{a,b}* while Lə C {b,c,d}*. However, if a language L consists of strings 
with symbols in }Ł, then we can also think of L as a language over any 
finite alphabet that is a superset of ©. Thus, for example, we can think of 
both Lı and Lə above as being languages over alphabet {a,b,c,d}. The 
fact that none of Lı’s strings contain symbols c or d is irrelevant, as is the 
fact that Lə2’s strings will not contain a. 

Likewise, when taking the complement of a language L that is a subset 
of Sj for some alphabet “1, we may choose to take the complement with 
respect to some alphabet “» that is a superset of ©. If so, then the 
complement of L will be X5 — L; that is, the complement of L with respect 
to Xə includes (among other strings) all those strings in X3 that have at 
least one symbol that isin Xə but not in /,. Had we taken the complement 
of L with respect to “1, then no string with symbols in %2.—%, would be in 
L. Thus, to be strict, we should always state the alphabet with respect to 
which a complement is taken. However, often it is obvious which alphabet 
is meant; e.g., if L is defined by an automaton, then the specification of 
that automaton includes the alphabet. Thus, we shall often speak of the 
“complement” without specifying the alphabet. 


Closure Under Union 
Theorem 4.4: If L and M are regular languages, then so is DU M. 


PROOF: This proof is simple. Since L and M are regular, they have regular 
expressions; say L = L(R) and M = L(S). Then L U M = L(R + S) by the 
definition of the + operator for regular expressions. 


Closure Under Complementation 


The theorem for union was made very easy by the use of the regular-expression 
representation for the languages. However, let us next consider complemen- 
tation. Do you see how to take a regular expression and change it into one 
that defines the complement language? Well neither do we. However, it can be 
done, because as we shall see in Theorem 4.5, it is easy to start with a DFA and 
construct a DFA that accepts the complement. Thus, starting with a regular 
expression, we could find a regular expression for its complement as follows: 


1. Convert the regular expression to an e-NFA. 


2. Convert that e-NFA to a DFA by the subset construction. 
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Closure Under Regular Operations 


The proof that regular languages are closed under union was exceptionally 
easy because union is one of the three operations that define the regular 
expressions. The same idea as Theorem 4.4 applies to concatenation and 
closure as well. That is: 


e If L and M are regular languages, then so is LM. 


e If L is a regular language, then so is L*. 


3. Complement the accepting states of that DFA. 


4. Turn the complement DFA back into a regular expression using the con- 
struction of Sections 3.2.1 or 3.2.2. 


Theorem 4.5: If L is a regular language over alphabet ©, then Z = * — L is 
also a regular language. 


PROOF: Let L = L(A) for some DFA A = (Q, £,ô,qo, F). Then L = L(B), 
where B is the DFA (Q, £, ô, qo, Q — F). That is, B is exactly like A, but the 
accepting states of A have become nonaccepting states of B, and vice versa. 
Then w is in L(B) if and only if 4(qo, w) is in Q — F, which occurs if and only 
if w is not in L(A). 


Notice that it is important for the above proof that 4(qo, w) is always some 
state; i.e., there are no missing transitions in A. If there were, then certain 
strings might lead neither to an accepting nor nonaccepting state of A, and 
those strings would be missing from both L(A) and L(B). Fortunately, we 
have defined a DFA to have a transition on every symbol of © from every state, 
so each string leads either to a state in F or a state in Q — F. 


Example 4.6: Let A be the automaton of Fig. 2.14. Recall that DFA A ac- 
cepts all and only the strings of 0’s and 1’s that end in 01; in regular-expression 
terms, L(A) = (0+ 1)*01. The complement of L(A) is therefore all strings 
of 0’s and 1’s that do not end in 01. Figure 4.2 shows the automaton for 
{0,1}* — L(A). It is the same as Fig. 2.14 but with the accepting state made 
nonaccepting and the two nonaccepting states made accepting. 


Example 4.7: In this example, we shall apply Theorem 4.5 to show a certain 
language not to be regular. In Example 4.2 we showed that the language Leq 
consisting of strings with an equal number of 0’s and 1’s is not regular. This 
proof was a straightforward application of the pumping lemma. Now consider 
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Figure 4.2: DFA accepting the complement of the language (0 + 1)*01 


the language M consisting of those strings of 0’s and 1’s that have an unequal 
number of 0’s and 1’s. 

It would be hard to use the pumping lemma to show M is not regular. 
Intuitively, if we start with some string w in M, break it into w = xyz, and 
“pump” y, we might find that y itself was a string like 01 that had an equal 
number of 0’s and 1’s. If so, then for no k will xy*z have an equal number of 0’s 
and 1’s, since xyz has an unequal number of 0’s and 1’s, and the numbers of 0’s 
and 1’s change equally as we “pump” y. Thus, we can never use the pumping 
lemma to contradict the assumption that M is regular. 

However, M is still not regular. The reason is that M = Z. Since the 
complement of the complement is the set we started with, it also follows that 
L= M. If M is regular, then by Theorem 4.5, L is regular. But we know L is 
not regular, so we have a proof by contradiction that M is not regular. 


Closure Under Intersection 


Now, let us consider the intersection of two regular languages. We actually 
have little to do, since the three boolean operations are not independent. Once 
we have ways of performing complementation and union, we can obtain the 
intersection of languages L and M by the identity 


LAM=LUM (4.1) 


In general, the intersection of two sets is the set of elements that are not in 
the complement of either set. That observation, which is what Equation (4.1) 
says, is one of DeMorgan’s laws. The other law is the same with union and 
intersection interchanged; that is, L U M = LN M. 

However, we can also perform a direct construction of a DFA for the in- 
tersection of two regular languages. This construction, which essentially runs 
two DFA’s in parallel, is useful in its own right. For instance, we used it to 
construct the automaton in Fig. 2.3 that represented the “product” of what 
two participants — the bank and the store — were doing. We shall make the 
product construction formal in the next theorem. 


Theorem 4.8: If L and M are regular languages, then so is L N M. 
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PROOF: Let L and M be the languages of automata Az = (Q1,™,61,q1, FL) 
and Am = (Qmu,¥,6m,qm, Fm). Notice that we are assuming that the alpha- 
bets of both automata are the same; that is, © is the union of the alphabets 
of L and M, if those alphabets are different. The product construction actu- 
ally works for NFA’s as well as DFA’s, but to make the argument as simple as 
possible, we assume that Ar and Ay are DFA’s. 

For LM M we shall construct an automaton A that simulates both Ar; and 
Am. The states of A are pairs of states, the first from A; and the second from 
Am. To design the transitions of A, suppose A is in state (p,q), where p is the 
state of Ay and q is the state of Am. If a is the input symbol, we see what Az 
does on input a; say it goes to state s. We also see what Am does on input 
a; say it makes a transition to state t. Then the next state of A will be (s,t). 
In that manner, A has simulated the effect of both Ar and Am. The idea is 
sketched in Fig. 4.3. 


Input a 


Start Accept 


Figure 4.3: An automaton simulating two other automata and accepting if and 
only if both accept 


The remaining details are simple. The start state of A is the pair of start 
states of Ay and Am. Since we want to accept if and only if both automata 
accept, we select as the accepting states of A all those pairs (p,q) such that p 
is an accepting state of Ay and q is an accepting state of Am. Formally, we 
define: 


A= (Qr x Qu,,™,6, (qz,4m), Fr x Fu) 
where ô((p, q), a) = (ôL (p, a), ôm (q, a)) i 
To see why L(A) = L(A) N L(Am), first observe that an easy induction 
on |w| proves that ô((qz,qm),w) = (ôr (qzr, w), ôm (qm, w)). But A accepts w if 
and only if ô((qr, qm),w) is a pair of accepting states. That is, ôr (qL, w) must 


be in Fz, and ôm (qm, w) must be in Fm. Put another way, w is accepted by A 
if and only if both Az and Ay accept w. Thus, A accepts the intersection of 
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Land M. 


Example 4.9: In Fig. 4.4 we see two DFA’s. The automaton in Fig. 4.4(a) 
accepts all those strings that have a 0, while the automaton in Fig. 4.4(b) 
accepts all those strings that have a 1. We show in Fig. 4.4(c) the product of 
these two automata. Its states are labeled by the pairs of states of the automata 
in (a) and (b). 


(c) 


Figure 4.4: The product construction 


It is easy to argue that this automaton accepts the intersection of the first 
two languages: those strings that have both a 0 and a 1. State pr represents 
only the initial condition, in which we have seen neither 0 nor 1. State gr means 
that we have seen only 0’s, while state ps represents the condition that we have 
seen only 1’s. The accepting state qs represents the condition where we have 
seen both 0’s and 1’s. 


Closure Under Difference 


There is a fourth operation that is often applied to sets and is related to the 
boolean operations: set difference. In terms of languages, L— M, the difference 
of L and M, is the set of strings that are in language L but not in language 
M. The regular languages are also closed under this operation, and the proof 
follows easily from the theorems just proven. 
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Theorem 4.10: If L and M are regular languages, then so is L — M. 


PROOF: Observe that L- M = L N M. By Theorem 4.5, M is regular, and 
by Theorem 4.8 LN M is regular. Therefore L — M is regular. 


4.2.2 Reversal 


The reversal of a string a,a2---ay is the string written backwards, that is, 
QnQn—1*++a,. We use w? for the reversal of string w. Thus, 0010” is 0100, and 
Elé, 

The reversal of a language L, written LË, is the language consisting of the 
reversals of all its strings. For instance, if L = {001,10,111}, then L® = 
{100, 01, 111}. 

Reversal is another operation that preserves regular languages; that is, if 
L is a regular language, so is L®. There are two simple proofs, one based on 
automata and one based on regular expressions. We shall give the automaton- 
based proof informally, and let you fill in the details if you like. We then prove 
the theorem formally using regular expressions. 

Given a language L that is L(A) for some finite automaton, perhaps with 
nondeterminism and e€-transitions, we may construct an automaton for LË by: 


1. Reverse all the arcs in the transition diagram for A. 


2. Make the start state of A be the only accepting state for the new automa- 
ton. 


3. Create anew start state po with transitions on e€ to all the accepting states 
of A. 


The result is an automaton that simulates A “in reverse,” and therefore accepts 
a string w if and only if A accepts w®. Now, we prove the reversal theorem 
formally. 


Theorem 4.11: If L is a regular language, so is LË. 


PROOF: Assume L is defined by regular expression Æ. The proof is a structural 
induction on the size of E. We show that there is another regular expression 


E® such that L(E®) = (L(B))*: that is, the language of EF is the reversal of 
the language of E. 


BASIS: If E is €, 0, or a, for some symbol a, then EF is the same as E. That 
is, we know {e}? = {e}, 08 = 0, and {a}? = {a}. 


INDUCTION: There are three cases, depending on the form of E. 
1. E = FE + F2. Then EË = EP + EË. The justification is that the reversal 


of the union of two languages is obtained by computing the reversals of 
the two languages and taking the union of those languages. 
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2. E = EE. Then B® = ERER. Note that we reverse the order of 
the two languages, as well as reversing the languages themselves. For 
instance, if L(F,) = {01,111} and L(E2) = {00,10}, then L(F, E2) = 
{0100, 0110, 11100, 11110}. The reversal of the latter language is 


{0010, 0110, 00111, 01111} 
If we concatenate the reversals of L(E2) and L(F,) in that order, we get 


{00,01}{10, 111} = {0010, 00111, 0110, 01111} 


which is the same language as (L(Eı Fy))". In general, if a word w in 
L(E) is the concatenation of wı from L(Eı) and we from L(E2), then 
m4 FT 
= WW). 
3. E = Ex. Then B® = (EP)*. The justification is that any string w in 
L(E) can be written as wi w2-+- Wy, where each w; is in L(E). But 


w? = ww ew” 
Each w? is in L(E®), so w? is in L((Ef)*). Conversely, any string in 
L((Ef’)*) is of the form wwz: wn, where each w; is the reversal of a 
string in L(E,). The reversal of this string, w*w?_,---w?, is therefore 
a string in L( Ey), which is L(E). We have thus shown that a string is in 
L(E) if and only if its reversal is in L((Ef*)*). 


Example 4.12: Let L be defined by the regular expression (0 + 1)0*. Then 
LF is the language of (0*)"(0 + 1)*, by the rule for concatenation. If we apply 
the rules for closure and union to the two parts, and then apply the basis rule 
that says the reversals of 0 and 1 are unchanged, we find that L? has regular 
expression 0*(0 + 1). 


4.2.3 Homomorphisms 


A string homomorphism is a function on strings that works by substituting a 
particular string for each symbol. 


Example 4.13: The function h defined by h(0) = ab and h(1) = € is a homo- 
morphism. Given any string of 0’s and 1’s, it replaces all 0’s by the string ab 
and replaces all 1’s by the empty string. For example, h applied to the string 
0011 is abab. 


Formally, if h is a homomorphism on alphabet X, and w = ajaz::-ay 
is a string of symbols in ©, then h(w) = h(aı)h(a2)---h(an). That is, we 
apply h to each symbol of w and concatenate the results, in order. For in- 
stance, if h is the homomorphism in Example 4.13, and w = 0011, then 
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h(w) = h(0)h(0)A(1)A(1) = (ab)(ab)(e)(€) = abab, as we claimed in that ex- 
ample. 

Further, we can apply a homomorphism to a language by applying it to 
each of the strings in the language. That is, if L is a language over alphabet 
£, and h is a homomorphism on ©, then A(L) = {h(w) | w is in L}. For 
instance, if L is the language of regular expression 10*1, i.e., any number of 
0’s surrounded by single 1’s, then h(L) is the language (ab)*. The reason is 
that h of Example 4.13 effectively drops the 1’s, since they are replaced by e€, 
and turns each 0 into ab. The same idea, applying the homomorphism directly 
to the regular expression, can be used to prove that the regular languages are 
closed under homomorphisms. 


Theorem 4.14: If L is a regular language over alphabet X, and h is a homo- 
morphism on ©, then h(Z) is also regular. 


PROOF: Let L = L(R) for some regular expression R. In general, if E is a 
regular expression with symbols in ©, let h(E) be the expression we obtain by 
replacing each symbol a of © in E by h(a). We claim that h(R) defines the 
language h(L). 

The proof is an easy structural induction that says whenever we take a 
subexpression Æ of R and apply h to it to get h(E), the language of h(E) 
is the same language we get if we apply h to the language L(E). Formally, 
L(h(E)) = h(L(B)). 


BASIS: If E is € or 9, then h(E) is the same as E, since h does not affect the 
string € or the language Ø. Thus, L(h(E)) = L(E). However, if E is 0 or e, then 
L(E) contains either no strings or a string with no symbols, respectively. Thus 
h(L(E)) = L(E) in either case. We conclude L(h(E)) = L(E) = h(L(E)). 

The only other basis case is if E = a for some symbol a in X. In this case, 
L(E) = {a}, so h(L(E)) = {h(a)}. Also, h(E) is the regular expression that 
is the string of symbols h(a). Thus, L(h(E)) is also {h(a)}, and we conclude 
L(h(E)) = h(L(E)). 


INDUCTION: There are three cases, each of them simple. We shall prove only 
the union case, where E = F+G. The way we apply homomorphisms to regular 
expressions assures us that h(E) = h(F + G) = h(F) + h(G). We also know 
that L(E) = L(F) U L(G) and 


L(h(B)) = L(h(F) + h(@)) = L(h(F)) U E(A(G)) (4.2) 
by the definition of what “+” means in regular expressions. Finally, 
h(L(E)) = h(L(F) U L(G)) = h(L(F)) U h(L(G)) (4.3) 


because h is applied to a language by application to each of its strings individ- 
ually. Now we may invoke the inductive hypothesis to assert that L(h(F)) = 
h(L(F)) and L(h(G)) = h(L(G)). Thus, the final expressions in (4.2) and 
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(4.3) are equivalent, and therefore so are their respective first terms; that is, 
L(h(E)) = h(L(E)). 

We shall not prove the cases where expression E is a concatenation or clo- 
sure; the ideas are similar to the above in both cases. The conclusion is that 
L(h(R)) is indeed h(L(R)); i.e., applying the homomorphism h to the regu- 
lar expression for language L results in a regular expression that defines the 
language h(L). 


4.2.4 Inverse Homomorphisms 


Homomorphisms may also be applied “backwards,” and in this mode they also 
preserve regular languages. That is, suppose h is a homomorphism from some 
alphabet © to strings in another (possibly the same) alphabet T.? Let L be 
a language over alphabet T. Then h~!(L), read “h inverse of L,” is the set 
of strings w in S* such that h(w) is in L. Figure 4.5 suggests the effect of 
a homomorphism on a language L in part (a), and the effect of an inverse 
homomorphism in part (b). 


(a) 


WL) h = | G) 


(b) 


Figure 4.5: A homomorphism applied in the forward and inverse direction 


Example 4.15: Let L be the language of regular expression (00 + 1)*. That 
is, L consists of all strings of 0’s and 1’s such that all the 0’s occur in adjacent 
pairs. Thus, 0010011 and 10000111 are in L, but 000 and 10100 are not. 


2That “T” should be thought of as a Greek capital tau, the letter following sigma. 
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Let h be the homomorphism defined by h(a) = 01 and h(b) = 10. We claim 
that h—!(L) is the language of regular expression (ba)*, that is, all strings of 
repeating ba pairs. We shall prove that h(w) is in L if and only if w is of the 
form baba---ba. 


(If) Suppose w is n repetitions of ba for some n > 0. Note that h(ba) = 1001, 
so h(w) is n repetitions of 1001. Since 1001 is composed of two 1’s and a pair of 
0’s, we know that 1001 is in L. Therefore any repetition of 1001 is also formed 
from 1 and 00 segments and is in L. Thus, h(w) is in L. 


(Only-if) Now, we must assume that h(w) is in L and show that w is of the 
form baba-:-ba. There are four conditions under which a string is not of that 
form, and we shall show that if any of them hold then h(w) is not in L. That 
is, we prove the contrapositive of the statement we set out to prove. 


1. If w begins with a, then h(w) begins with O1. It therefore has an isolated 
0, and is not in L. 


2. If w ends in b, then h(w) ends in 10, and again there is an isolated 0 in 
h(w). 


3. If w has two consecutive a’s, then h(w) has a substring 0101. Here too, 
there is an isolated 0 in w. 


4. Likewise, if w has two consecutive b’s, then h(w) has substring 1010 and 
has an isolated 0. 


Thus, whenever one of the above cases hold, h(w) is not in L. However, unless 
at least one of items (1) through (4) hold, then w is of the form baba. --ba. 
To see why, assume none of (1) through (4) hold. Then (1) tells us w must 
begin with b, and (2) tells us w ends with a. Statements (3) and (4) tell us that 
a’s and b’s must alternate in w. Thus, the logical “OR” of (1) through (4) is 
equivalent to the statement “w is not of the form baba--- ba.” We have proved 
that the “OR” of (1) through (4) implies h(w) is not in L. That statement is 
the contrapositive of the statement we wanted: “if h(w) is in L, then w is of 
the form baba - - + ba.” 


We shall next prove that the inverse homomorphism of a regular language 
is also regular, and then show how the theorem can be used. 


Theorem 4.16: If h is a homomorphism from alphabet © to alphabet T, and 
L is a regular language over T, then h—!(L) is also a regular language. 


PROOF: The proof starts with a DFA A for L. We construct from A and ha 
DFA for h—!(ZL) using the plan suggested by Fig. 4.6. This DFA uses the states 
of A but translates the input symbol according to h before deciding on the next 
state. 

Formally, let L be L(A), where DFA A = (Q,7,6, qo, F). Define a DFA 


B= (Q,,7, 40, F) 
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Input a 


L 
Input 
Start h(a) to A 


Accept/reject 
A = 


Figure 4.6: The DFA for h7! (L) applies h to its input, and then simulates the 
DFA for L 


where transition function y is constructed by the rule y(q,a) = 5(q, h(a)). That 
is, the transition B makes on input a is the result of the sequence of transitions 
that A makes on the string of symbols h(a). Remember that h(a) could be e, 
it could be one symbol, or it could be many symbols, but ô is properly defined 
to take care of all these cases. 

It is an easy induction on |w| to show that 4(qo,w) = ô (qo, h(w)). Since the 
accepting states of A and B are the same, B accepts w if and only if A accepts 
h(w). Put another way, B accepts exactly those strings w that are in h7!(L). 


Example 4.17: In this example we shall use inverse homomorphism and sev- 
eral other closure properties of regular sets to prove an odd fact about finite 
automata. Suppose we required that a DFA visit every state at least once when 
accepting its input. More precisely, suppose A = (Q, £, ô, qo, F) is a DFA, and 
we are interested in the language L of all strings w in %* such that 5(qo,w) 
is in F, and also for every state q in Q there is some prefix x, of w such that 
5(qo, z4) = q. Is L regular? We can show it is, but the construction is complex. 

First, start with the language M that is L(A), i.e., the set of strings that 
A accepts in the usual way, without regard to what states it visits during the 
processing of its input. Note that L C M, since the definition of L puts an 
additional condition on the strings of L(A). Our proof that L is regular begins 
by using an inverse homomorphism to, in effect, place the states of A into the 
input symbols. More precisely, let us define a new alphabet T consisting of 
symbols that we may think of as triples [pag], where: 


1. pand q are states in Q, 


2. ais a symbol in X, and 


3. ô(p,a) =q. 
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That is, we may think of the symbols in T as representing transitions of the 
automaton A. It is important to see that the notation [paq] is our way of 
expressing a single symbol, not the concatenation of three symbols. We could 
have given it a single letter as a name, but then its relationship to p, q, and a 
would be hard to describe. 

Now, define the homomorphism h([paq]) = a for all p, a, and q. That is, h 
removes the state components from each of the symbols of T and leaves only 
the symbol from X. Our first step in showing L is regular is to construct the 
language Lı = h~'(M). Since M is regular, so is Lı by Theorem 4.16. The 
strings of Lı are just the strings of M with a pair of states, representing a 
transition, attached to each symbol. 

As a very simple illustration, consider the two-state automaton of Fig. 
4.4(a). The alphabet © is {0,1}, and the alphabet T consists of the four sym- 
bols [p0q], [q0q], [p1p], and [q1q]. For instance, there is a transition from state 
p to q on input 0, so [p0q] is one of the symbols of T. Since 101 is a string ac- 
cepted by the automaton, h~! applied to this string will give us 2? = 8 strings, 
of which [p1p][p0q|[¢1q] and [q1gq][q0q|[p1p] are two examples. 

We shall now construct L from Lı by using a series of further operations 
that preserve regular languages. Our first goal is to eliminate all those strings 
of Lı that deal incorrectly with states. That is, we can think of a symbol like 
[paq] as saying the automaton was in state p, read input a, and thus entered 
state q. The sequence of symbols must satisfy three conditions if it is to be 
deemed an accepting computation of A: 


1. The first state in the first symbol must be qo, the start state of A. 


2. Each transition must pick up where the previous one left off. That is, 
the first state in one symbol must equal the second state of the previous 
symbol. 


3. The second state of the last symbol must be in F. This condition in fact 
will be guaranteed once we enforce (1) and (2), since we know that every 
string in Lı came from a string accepted by A. 


The plan of the construction of L is shown in Fig. 4.7. 

We enforce (1) by intersecting Lı with the set of strings that begin with a 
symbol of the form [qoaq] for some symbol a and state q. That is, let EZ, be the 
expression [qoa1qi] + [goa2q2] + -+ +, where the pairs aiqi range over all pairs in 
E x Q such that 6(qo,a;) = qi. Then let Ly = Lı N L(E,\T*). Since E,T™* is 
a regular expression denoting all strings in T* that begin with the start state 
(treat T in the regular expression as the sum of its symbols), Lə is all strings 
that are formed by applying h7! to language M and that have the start state 
as the first component of its first symbol; i.e., it meets condition (1). 

To enforce condition (2), it is easier to subtract from Lə (using the set- 
difference operation) all those strings that violate it. Let Ey be the regular 
expression consisting of the sum (union) of the concatenation of all pairs of 
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M The language of automaton A 

i Inverse homomorphism 

Lı Strings of M with state transitions embedded 
i Intersection with a regular language 

L 


2 Add condition that first state is the start state 


Difference with a regular language 


Y 

L3 Add condition that adjacent states are equal 

i Difference with regular languages 

Ly Add condition that all states appear on the path 
i Homomorphism 

L 


Delete state components, leaving the symbols 


Figure 4.7: Constructing language L from language M by applying operations 
that preserve regularity of languages 


symbols that fail to match; that is, pairs of the form [paq]|rbs] where q # r. 
Then T*EəT* is a regular expression denoting all strings that fail to meet 
condition (2). 

We may now define L3 = Ly — L(T*F2T*). The strings of Lz satisfy condi- 
tion (1) because strings in Lə must begin with the start symbol. They satisfy 
condition (2) because the subtraction of L(T* E)T*) removes any string that 
violates that condition. Finally, they satisfy condition (3), that the last state 
is accepting, because we started with only strings in M, all of which lead to 
acceptance by A. The effect is that L3 consists of the strings in M with the 
states of the accepting computation of that string embedded as part of each 
symbol. Note that L3 is regular because it is the result of starting with the 
regular language M, and applying operations — inverse homomorphism, inter- 
section, and set difference — that yield regular sets when applied to regular 
sets. 


Recall that our goal was to accept only those strings in M that visited 
every state in their accepting computation. We may enforce this condition by 
additional applications of the set-difference operator. That is, for each state q, 
let E; be the regular expression that is the sum of all the symbols in T such 
that q appears in neither its first or last position. If we subtract L(E}) from 
L3 we have those strings that are an accepting computation of A and that visit 
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state q at least once. If we subtract from Ls all the languages L(E7) for q in 
Q, then we have the accepting computations of A that visit all the states. Call 
this language L4. By Theorem 4.10 we know Ly, is also regular. 

Our final step is to construct L from L4 by getting rid of the state com- 
ponents. That is, L = h(L4). Now, L is the set of strings in %* that are 
accepted by A and that visit each state of A at least once during their accep- 
tance. Since regular languages are closed under homomorphisms, we conclude 
that L is regular. 


4.2.5 Exercises for Section 4.2 


Exercise 4.2.1: Suppose h is the homomorphism from the alphabet {0, 1,2} 
to the alphabet {a,b} defined by: h(0) = a; h(1) = ab, and h(2) = ba. 


* a) What is h(0120)? 


b) What is h(21120)? 


d 


) 
) 
* c) If L is the language L(01*2), what is h(L)? 
) If L is the language L(0 + 12), what is h(L)? 
) 


* e) Suppose L is the language {ababa}, that is, the language consisting of 


only the one string ababa. What is h~!(L)? 
! f) If L is the language L(a(ba)*), what is h~'(L)? 


*! Exercise 4.2.2: If Lis a language, and a is a symbol, then L/a, the quotient 
of L and a, is the set of strings w such that wa is in L. For example, if 
L = {a,aab, baa}, then L/a = {e,ba}. Prove that if L is regular, so is L/a. 
Hint: Start with a DFA for L and consider the set of accepting states. 


! Exercise 4.2.3: If L is a language, and a is a symbol, then a\F is the set 
of strings w such that aw is in L. For example, if L = {a,aab,baa}, then 
a\L = {e,ab}. Prove that if L is regular, so is a\L. Hint: Remember that the 
regular languages are closed under reversal and under the quotient operation of 
Exercise 4.2.2. 


! Exercise 4.2.4: Which of the following identities are true? 


a) (L/a)a = L (the left side represents the concatenation of the languages 
L/a and {a}). 


b) a(a\L) = L (again, concatenation with {a}, this time on the left, is 
intended). 


c) (La)/a= L. 
d) a\(aL) = L. 


xN 
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Exercise 4.2.5: The operation of Exercise 4.2.3 is sometimes viewed as a “der- 
ivative,” and a\L is written dL, These derivatives apply to regular expressions 
in a manner similar to the way ordinary derivatives apply to arithmetic expres- 
sions. Thus, if R is a regular expression, we shall use aR to mean the same as 


aL if L= L(R). 


da’ 


d(R+S) _ dR, ds 
a) Show that ST = Gt + 2. 


*! b) Give the rule for the “derivative” of RS. Hint: You need to consider two 
cases: if L(R) does or does not contain e. This rule is not quite the same 
as the “product rule” for ordinary derivatives, but is similar. 


! c) Give the rule for the “derivative” of a closure, i.e., ae) 


d) Use the rules from (a)-(c) to find the “derivatives” of regular expression 
(0 + 1)*011 with respect to 0 and 1. 


dL _ gy 


* e) Characterize those languages L for which $¢ 


*! f) Characterize those languages L for which ah =L. 
Exercise 4.2.6: Show that the regular languages are closed under the follow- 
ing operations: 


a) min(L) = {w | w is in L, but no proper prefix of w is in L}. 
b) maz(L) = {w | w is in L and for no g other than e€ is wa in L}. 
c) init(L) = {w | for some z, wa is in L}. 


Hint: Like Exercise 4.2.2, it is easiest to start with a DFA for L and perform a 
construction to get the desired language. 


Exercise 4.2.7: If w = a,a2---an and x = bıb2- -bn are strings of the same 
length, define alt(w,x) to be the string in which the symbols of w and g al- 
ternate, starting with w, that is, a,b,a2b2---anby. If L and M are languages, 
define alt(L, M) to be the set of strings of the form alt(w,x), where w is any 
string in L and «x is any string in M of the same length. Prove that if L and 
M are regular, so is alt( L, M). 


Exercise 4.2.8: Let L be a language. Define half(L) to be the set of first 
halves of strings in L, that is, {w | for some x such that |z| = |w|, we have wg 
in L}. For example, if L = {e,0010,011,010110} then half(L) = {e, 00,010}. 
Notice that odd-length strings do not contribute to kalf(L). Prove that if L is 
a regular language, so is half(L). 


! Exercise 4.2.9: We can generalize Exercise 4.2.8 to a number of functions that 


determine how much of the string we take. If f is a function of integers, define 
f(L) to be {w | for some z, with |x| = f(|w|), we have wa in L}. For instance, 
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the operation half corresponds to f being the identity function f(n) = n, since 
half(L) is defined by having |z| = |w|. Show that if L is a regular language, 
then so is f(L), if f is one of the following functions: 


a) f(n) = 2n (i.e., take the first thirds of strings). 


b) f(n) =n? (i.e., the amount we take has length equal to the square root 
of what we do not take. 


c) f(n) = 2” (i.e., what we take has length equal to the logarithm of what 
we leave). 


! Exercise 4.2.10: Suppose that L is any language, not necessarily regular, 


whose alphabet is {0}; i.e., the strings of L consist of 0’s only. Prove that L* is 
regular. Hint: At first, this theorem sounds preposterous. However, an example 
will help you see why it is true. Consider the language L = {0° | i is prime}, 
which we know is not regular by Example 4.3. Strings 00 and 000 are in L, 
since 2 and 3 are both primes. Thus, if j > 2, we can show 0 is in L*. If j is 
even, use j/2 copies of 00, and if j is odd, use one copy of 000 and (j — 3)/2 
copies of 00. Thus, L* = e + 000%. 


! Exercise 4.2.11: Show that the regular languages are closed under the fol- 


lowing operation: cycle(L) = {w | we can write w as w = zy, such that yz is 
in L}. For example, if L = {01,011}, then cycle(Z) = {01, 10,011, 110, 101}. 
Hint: Start with a DFA for L and construct an e-NFA for cycle(L). 


! Exercise 4.2.12: Let wı = apaoa;, and w; = wj_1w;_1aq; for all i > 1. 


For instance, w3 = G9A9G1d949014249A941A9a9a,a2a3. The shortest regular 
expression for the language Ln = {wy}, i.e., the language consisting of the one 
string wn, is the string wn itself, and the length of this expression is 2"+1 — 1. 
However, if we allow the intersection operator, we can write an expression for 
In whose length is O(n”). Find such an expression. Hint: Find n languages, 
each with regular expressions of length O(n), whose intersection is Dy. 


Exercise 4.2.13: We can use closure properties to help prove certain lan- 
guages are not regular. Start with the fact that the language 


is not a regular set. Prove the following languages not to be regular by trans- 
forming them, using operations known to preserve regularity, to Lonin: 


* a) {001 | iZ j}. 
b) {O"1™2"-™ | n >m > 0}. 


Exercise 4.2.14: In Theorem 4.8, we described the “product construction” 
that took two DFA’s and constructed one DFA whose language is the intersec- 
tion of the languages of the first two. 
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a) Show how to perform the product construction on NFA’s (without e- 
transitions). 


! b) Show how to perform the product construction on e-NFA’s. 


* c) Show how to modify the product construction so the resulting DFA ac- 
cepts the difference of the languages of the two given DFA’s. 


d) Show how to modify the product construction so the resulting DFA ac- 
cepts the union of the languages of the two given DFA’s. 


Exercise 4.2.15: In the proof of Theorem 4.8 we claimed that it could be 
proved by induction on the length of w that 


ô((qL, qm), w) = (ôr (aL, w), ôm (qm, w)) 
Give this inductive proof. 


Exercise 4.2.16: Complete the proof of Theorem 4.14 by considering the cases 
where expression E is a concatenation of two subexpressions and where E is 
the closure of an expression. 


Exercise 4.2.17: In Theorem 4.16, we omitted a proof by induction on the 
length of w that 4(qo,w) = ô (qo, h(w)). Prove this statement. 


4.3 Decision Properties of Regular Languages 


In this section we consider how one answers important questions about regular 
languages. First, we must consider what it means to ask a question about a 
language. The typical language is infinite, so you cannot present the strings of 
the language to someone and ask a question that requires them to inspect the 
infinite set of strings. Rather, we present a language by giving one of the finite 
representations for it that we have developed: a DFA, an NFA, an e-NFA, or a 
regular expression. 

Of course the language so described will be regular, and in fact there is no 
way at all to represent completely arbitrary languages. In later chapters we 
shall see finite ways to represent more than the regular languages, so we can 
consider questions about languages in these more general classes. However, for 
many of the questions we ask, algorithms exist only for the class of regular 
languages. The same questions become “undecidable” (no algorithm to answer 
them exists) when posed using more “expressive” notations (i.e., notations that 
can be used to express a larger set of languages) than the representations we 
have developed for the regular languages. 

We begin our study of algorithms for questions about regular languages by 
reviewing the ways we can convert one representation into another for the same 
language. In particular, we want to observe the time complexity of the algo- 
rithms that perform the conversions. We then consider some of the fundamental 
questions about languages: 
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1. Is the language described empty? 
2. Is a particular string w in the described language? 


3. Do two descriptions of a language actually describe the same language? 
This question is often called “equivalence” of languages. 


4.3.1 Converting Among Representations 


We know that we can convert any of the four representations for regular lan- 
guages to any of the other three representations. Figure 3.1 gave paths from 
any representation to any of the others. While there are algorithms for any 
of the conversions, sometimes we are interested not only in the possibility of 
making a conversion, but in the amount of time it takes. In particular, it is 
important to distinguish between algorithms that take exponential time (as a 
function of the size of their input), and therefore can be performed only for 
relatively small instances, from those that take time that is a linear, quadratic, 
or some small-degree polynomial of their input size. The latter algorithms are 
“realistic,” in the sense that we expect them to be executable for large instances 
of the problem. We shall consider the time complexity of each of the conversions 
we discussed. 


Converting NFA’s to DFA’s 


When we start with either an NFA or and e-NFA and convert it to a DFA, the 
time can be exponential in the number of states of the NFA. First, computing 
the e-closure of n states takes O(n?) time. We must search from each of the n 
states along all arcs labeled e. If there are n states, there can be no more than 
n? arcs. Judicious bookkeeping and well-designed data structures will make 
sure that we can explore from each state in O(n”) time. In fact, a transitive 
closure algorithm such as Warshall’s algorithm can be used to compute the 
entire e-closure at once.’ 

Once the e-closure is computed, we can compute the equivalent DFA by the 
subset construction. The dominant cost is, in principle, the number of states 
of the DFA, which can be 2”. For each state, we can compute the transitions 
in O(n?) time by consulting the e-closure information and the NFA’s transition 
table for each of the input symbols. That is, suppose we want to compute 
6({q1,92,---,9¢},a) for the DFA. There may be as many as n states reachable 
from each q; along ¢-labeled paths, and each of those states may have up to n 
arcs labeled a. By creating an array indexed by states, we can compute the 
union of up to n sets of up to n states in time proportional to n?. 

In this way, we can compute, for each qi, the set of states reachable from 
qi along a path labeled a (possibly including ¢’s). Since k < n, there are at 
most n states to deal with. We compute the reachable states for each in O(n?) 


3For a discussion of transitive closure algorithms, see A. V. Aho, J. E. Hopcroft, and J. 
D. Ullman, Data Structures and Algorithms, Addison-Wesley, 1984. 
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time. Thus, the total time spent computing reachable states is O(n). The 
union of the sets of reachable states requires only O(n”) additional time, and 
we conclude that the computation of one DFA transition takes O(n?) time. 

Note that the number of input symbols is assumed constant, and does not 
depend on n. Thus, in this and other estimates of running time, we do not 
consider the number of input symbols as a factor. The size of the input alpha- 
bet influences the constant factor that is hidden in the “big-oh” notation, but 
nothing more. 

Our conclusion is that the running time of NFA-to-DFA conversion, includ- 
ing the case where the NFA has e-transitions, is O(n?2”). Of course in practice 
it is common that the number of states created is much less than 2”, often only 
n states. We could state the bound on the running time as O(n?s), where s is 
the number of states the DFA actually has. 


DFA-to-NFA Conversion 


This conversion is simple, and takes O(n) time on an n-state DFA. All that we 
need to do is modify the transition table for the DFA by putting set-brackets 
around states and, if the output is an e-NFA, adding a column for e. Since we 
treat the number of input symbols (i.e., the width of the transition table) as a 
constant, copying and processing the table takes O(n) time. 


Automaton-to-Regular-Expression Conversion 


If we examine the construction of Section 3.2.1 we observe that at each of n 
rounds (where n is the number of states of the DFA) we can quadruple the size 
of the regular expressions constructed, since each is built from four expressions 
of the previous round. Thus, simply writing down the n? expressions can take 
time O(n?4"). The improved construction of Section 3.2.2 reduces the constant 
factor, but does not affect the worst-case exponentiality of the problem. 

The same construction works in the same running time if the input is an 
NFA, or even an e-NFA, although we did not prove those facts. It is important 
to use those constructions for NFA’s, however. If we first convert an NFA to 
a DFA and then convert the DFA to a regular expression, it could take time 
O(8"42"), which is doubly exponential. 


Regular-Expression-to-Automaton Conversion 


Conversion of a regular expression to an e-NFA takes linear time. We need to 
parse the expression efficiently, using a technique that takes only O(n) time on 
a regular expression of length n.4 The result is an expression tree with one 
node for each symbol of the regular expression (although parentheses do not 
have to appear in the tree; they just guide the parsing of the expression). 


4Parsing methods capable of doing this task in O(n) time are discussed in A. V. Aho, 
R. Sethi, and J. D. Ullman, Compiler Design: Principles, Tools, and Techniques, Addison- 
Wesley, 1986. 
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Once we have an expression tree for the regular expression, we can work 
up the tree, building the e-NFA for each node. The construction rules for the 
conversion of a regular expression that we saw in Section 3.2.3 never add more 
than two states and four arcs for any node of the expression tree. Thus, the 
numbers of states and arcs of the resulting e-NFA are both O(n). Moreover, 
the work at each node of the parse tree in creating these elements is constant, 
provided the function that processes each subtree returns pointers to the start 
and accepting states of its automaton. 

We conclude that construction of an e-NFA from a regular expression takes 
time that is linear in the size of the expression. We can eliminate e-transitions 
from an n-state «NFA, to make an ordinary NFA, in O(n?) time, without 
increasing the number of states. However, proceeding to a DFA can take expo- 
nential time. 


4.3.2 Testing Emptiness of Regular Languages 


At first glance the answer to the question “is regular language L empty?” is 
obvious: Ý is empty, and all other regular languages are not. However, as we 
discussed at the beginning of Section 4.3, the problem is not stated with an 
explicit list of the strings in L. Rather, we are given some representation for L 
and need to decide whether that representation denotes the language Q. 

If our representation is any kind of finite automaton, the emptiness question 
is whether there is any path whatsoever from the start state to some accepting 
state. If so, the language is nonempty, while if the accepting states are all 
separated from the start state, then the language is empty. Deciding whether 
we can reach an accepting state from the start state is a simple instance of 
graph-reachability, similar in spirit to the calculation of the e-closure that we 
discussed in Section 2.5.3. The algorithm can be summarized by this recursive 
process. 


BASIS: The start state is surely reachable from the start state. 


INDUCTION: If state q is reachable from the start state, and there is an arc 
from q to p with any label (an input symbol, or e€ if the automaton is an e-NFA), 
then p is reachable. 


In that manner we can compute the set of reachable states. If any accepting 
state is among them, we answer “no” (the language of the automaton is not 
empty), and otherwise we answer “yes.” Note that the reachability calculation 
takes no more time than O(n?) if the automaton has n states, and in fact it is 
no worse than proportional to the number of arcs in the automaton’s transition 
diagram, which could be less than n? and cannot be more than O(n’). 

If we are given a regular expression representing the language L, rather 
than an automaton, we could convert the expression to an e-NFA and proceed 
as above. Since the automaton that results from a regular expression of length 
n has at most O(n) states and transitions, the algorithm takes O(n) time. 
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However, we can also inspect the regular expression to decide whether it 
is empty. Notice first that if the expression has no occurrence of Ø, then its 
language is surely not empty. If there are @’s, the language may or may not be 
empty. The following recursive rules tell whether a regular expression denotes 
the empty language. 


BASIS: Ý denotes the empty language; € and a for any input symbol a do not. 


INDUCTION: Suppose R is a regular expression. There are four cases to con- 
sider, corresponding to the ways that R could be constructed. 


1. R= Rı + Ry. Then L(R) is empty if and only if both L(Rı) and L(Rə2) 
are empty. 


2. R = R, Rə. Then L(R) is empty if and only if either L(Rı) or L( Re) is 
empty. 


3. R= Rj. Then L(R) is not empty; it always includes at least €. 


4. R= (Rı). Then L(R) is empty if and only if L(Rı) is empty, since they 
are the same language. 


4.3.3 Testing Membership in a Regular Language 


The next question of importance is, given a string w and a regular language L, 
is w in L. While w is represented explicitly, L is represented by an automaton 
or regular expression. 

If L is represented by a DFA, the algorithm is simple. Simulate the DFA 
processing the string of input symbols w, beginning in the start state. If the 
DFA ends in an accepting state, the answer is “yes”; otherwise the answer is 
“no.” This algorithm is extremely fast. If |w| = n, and the DFA is represented 
by a suitable data structure, such as a two-dimensional array that is the transi- 
tion table, then each transition requires constant time, and the entire test takes 
O(n) time. 

If L has any other representation besides a DFA, we could convert to a DFA 
and run the test above. That approach could take time that is exponential 
in the size of the representation, although it is linear in |w|. However, if the 
representation is an NFA or e-NFA, it is simpler and more efficient to simulate 
the NFA directly. That is, we process symbols of w one at a time, maintaining 
the set of states the NFA can be in after following any path labeled with that 
prefix of w. The idea was presented in Fig. 2.10. 

If w is of length n, and the NFA has s states, then the running time of this 
algorithm is O(ns?). Each input symbol can be processed by taking the previous 
set of states, which numbers at most s states, and looking at the successors of 
each of these states. We take the union of at most s sets of at most s states 
each, which requires O(s?) time. 

If the NFA has e-transitions, then we must compute the e-closure before 
starting the simulation. Then the processing of each input symbol a has two 
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stages, each of which requires O(s?) time. First, we take the previous set of 
states and find their successors on input symbol a. Next, we compute the e- 
closure of this set of states. The initial set of states for the simulation is the 
e-closure of the initial state of the NFA. 

Lastly, if the representation of L is a regular expression of size s, we can 
convert to an e-NFA with at most 2s states, in O(s) time. We then perform 
the simulation above, taking O(ns”) time on an input w of length n. 


4.3.4 Exercises for Section 4.3 


Exercise 4.3.1: Give an algorithm to tell whether a regular language L is 
infinite. Hint: Use the pumping lemma to show that if the language contains 
any string whose length is above a certain lower limit, then the language must 
be infinite. 


Exercise 4.3.2: Give an algorithm to tell whether a regular language L con- 
tains at least 100 strings. 


Exercise 4.3.3: Suppose L is a regular language with alphabet ©. Give an 
algorithm to tell whether L = %*, i.e., all strings over its alphabet. 


Exercise 4.3.4: Give an algorithm to tell whether two regular languages Lı 
and Lə have at least one string in common. 


Exercise 4.3.5: Give an algorithm to tell, for two regular languages Lı and 
La over the same alphabet ©, whether there is any string in &* that is in neither 
Li nor Lə. 


4.4 Equivalence and Minimization of Automata 


In contrast to the previous questions — emptiness and membership — whose 
algorithms were rather simple, the question of whether two descriptions of two 
regular languages actually define the same language involves considerable intel- 
lectual mechanics. In this section we discuss how to test whether two descriptors 
for regular languages are equivalent, in the sense that they define the same lan- 
guage. An important consequence of this test is that there is a way to minimize 
a DFA. That is, we can take any DFA and find an equivalent DFA that has 
the minimum number of states. In fact, this DFA is essentially unique: given 
any two minimum-state DFA’s that are equivalent, we can always find a way 
to rename the states so that the two DFA’s become the same. 


4.4.1 Testing Equivalence of States 


We shall begin by asking a question about the states of a single DFA. Our goal 
is to understand when two distinct states p and q can be replaced by a single 
state that behaves like both p and q. We say that states p and q are equivalent 
if: 
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e For all input strings w, ô(p, w) is an accepting state if and only if 5(q, w) 
is an accepting state. 


Less formally, it is impossible to tell the difference between equivalent states 
p and q merely by starting in one of the states and asking whether or not a 
given input string leads to acceptance when the automaton is started in this 
(unknown) state. Note we do not require that 6(p,w) and d(q,w) are the same 
state, only that either both are accepting or both are nonaccepting. 

If two states are not equivalent, then we say they are distinguishable. That 
is, state p is distinguishable from state q if there is at least one string w such 
that one of (p, w) and 6(q,w) is accepting, and the other is not accepting. 


Example 4.18: Consider the DFA of Fig. 4.8, whose transition function we 
shall refer to as 6 in this example. Certain pairs of states are obviously not 
equivalent. For example, C and G are not equivalent because one is accepting 
and the other is not. That is, the empty string distinguishes these two states, 
because d(C, €) is accepting and 6(G,€) is not. 


Start 


Figure 4.8: An automaton with equivalent states 


Consider states A and G. String e doesn’t distinguish them, because they are 
both nonaccepting states. String 0 doesn’t distinguish them because they go to 
states B and G, respectively on input 0, and both these states are nonaccepting. 
Likewise, string 1 doesn’t distinguish A from G, because they go to F and E, 
respectively, and both are nonaccepting. However, 01 distinguishes A from G, 
because 6(A,01) = C, 6(G,01) = E, C is accepting, and F is not. Any input 
string that takes A and G to states only one of which is accepting is sufficient 
to prove that A and G are not equivalent. 

In contrast, consider states A and Æ. Neither is accepting, so e does not 
distinguish them. On input 1, they both go to state F. Thus, no input string 
that begins with 1 can distinguish A from E, since for any string x, 6(A, 1x) = 
6(E, 12). 

Now consider the behavior of states A and E on inputs that begin with 0. 
They go to states B and H, respectively. Since neither is accepting, string 0 
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by itself does not distinguish A from E. However, B and H are no help. On 
input 1 they both go to C, and on input 0 they both go to G. Thus, all inputs 
that begin with 0 will fail to distinguish A from E. We conclude that no input 
string whatsoever will distinguish A from FE; i.e., they are equivalent states. 


To find states that are equivalent, we make our best efforts to find pairs 
of states that are distinguishable. It is perhaps surprising, but true, that if 
we try our best, according to the algorithm to be described below, then any 
pair of states that we do not find distinguishable are equivalent. The algo- 
rithm, which we refer to as the table-filling algorithm, is a recursive discovery 
of distinguishable pairs in a DFA A = (Q, ©, ô, qo, F). 


BASIS: If p is an accepting state and q is nonaccepting, then the pair {p,q} is 
distinguishable. 


INDUCTION: Let p and q be states such that for some input symbol a, r = 
6(p,a) and s = 6(q,a) are a pair of states known to be distinguishable. Then 
{p,q} is a pair of distinguishable states. The reason this rule makes sense is 
that there must be some string w that distinguishes r from s; that is, exactly 
one of 4(r, w) and 4(s,w) is accepting. Then string aw must distinguish p from 
q, since ô(p, aw) and ô(q, aw) is the same pair of states as 6(r,w) and 4(s, w). 


Example 4.19: Let us execute the table-filling algorithm on the DFA of 
Fig 4.8. The final table is shown in Fig. 4.9, where an x indicates pairs of 
distinguishable states, and the blank squares indicate those pairs that have 
been found equivalent. Initially, there are no z’s in the table. 


B |x 

C |x |x 

D |x |x |x 

E x |x |x 

F |x |x |x x 
G |x |x |x |x |x |x 
H |x x |x |x |x |x 


ABCDEFG 


Figure 4.9: Table of state inequivalences 


For the basis, since C is the only accepting state, we put x’s in each pair 
that involves C. Now that we know some distinguishable pairs, we can discover 
others. For instance, since {C, H} is distinguishable, and states E and F go to 
H and C, respectively, on input 0, we know that {F, F} is also a distinguishable 
pair. In fact, all the x’s in Fig. 4.9 with the exception of {A,G} and {E, G} 
are discovered simply by looking at the transitions from the pair of states on 
either 0 or on 1, and observing that, for one of those inputs, one state goes to 
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C and the other does not. {A,G} and {F,G} are shown distinguishable on the 
next round. On input 1, A and E go to F, while G goes to E, and we already 
know that E and F are distinguishable. 

However, then we can discover no more distinguishable pairs. The three 
remaining pairs, which are therefore equivalent pairs, are {A, E}, {B, H}, and 
{D, F}. For example, consider why we can not infer that {A, Æ} is a distin- 
guishable pair. On input 0, A and E go to B and H, respectively, and {B, H} 
has not yet been shown distinguishable. On input 1, A and E both go to F, so 
there is no hope of distinguishing them that way. The other two pairs, {B, H} 
and {D, F} will never be distinguished because they each have identical tran- 
sitions on 0 and identical transitions on 1. Thus, the table-filling algorithm 
stops with the table as shown in Fig. 4.9, which is the correct determination of 
equivalent and distinguishable states. 


Theorem 4.20: If two states are not distinguished by the table-filling algo- 
rithm, then the states are equivalent. 


PROOF: Let us again assume we are talking of the DFA A = (Q, £, ô, qo, F). 
Suppose the theorem is false; that is, there is at least one pair of states {p,q} 
such that 


1. States p and q are distinguishable, in the sense that there is some string 
w such that exactly one of 6(p,w) and ô(q, w) is accepting, and yet 


2. The table-filling algorithm does not find p and q to be distinguished. 


Call such a pair of states a bad pair. 

If there are bad pairs, then there must be some that are distinguished by the 
shortest strings among all those strings that distinguish bad pairs. Let {p,q} 
be one such bad pair, and let w = a,a,---a,, be a string as short as any that 
distinguishes p from q. Then exactly one of ô(p, w) and ô(q, w) is accepting. 

Observe first that w cannot be e, since if e distinguishes a pair of states, 
then that pair is marked by the basis part of the table-filling algorithm. Thus, 
n >l. 

Consider the states r = ô(p, a) and s = ô(q,a1). States r and s are distin- 
guished by the string azas ::an, since this string takes r and s to the states 
(p, w) and (q, w). However, the string distinguishing r from s is shorter than 
any string that distinguishes a bad pair. Thus, {r,s} cannot be a bad pair. 
Rather, the table-filling algorithm must have discovered that they are distin- 
guishable. 

But the inductive part of the table-filling algorithm will not stop until it has 
also inferred that p and q are distinguishable, since it finds that ô(p,a1) = r is 
distinguishable from 6(q,a1) = s. We have contradicted our assumption that 
bad pairs exist. If there are no bad pairs, then every pair of distinguishable 
states is distinguished by the table-filling algorithm, and the theorem is true. 
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4.4.2 Testing Equivalence of Regular Languages 


The table-filling algorithm gives us an easy way to test if two regular languages 
are the same. Suppose languages L and M are each represented in some way, 
e.g., one by a regular expression and one by an NFA. Convert each represent- 
ation to a DFA. Now, imagine one DFA whose states are the union of the 
states of the DFA’s for L and M. Technically, this DFA has two start states, 
but actually the start state is irrelevant as far as testing state equivalence is 
concerned, so make any state the lone start state. 

Now, test if the start states of the two original DFA’s are equivalent, using 
the table-filling algorithm. If they are equivalent, then L = M, and if not, then 


LEM. 
Start 


Figure 4.10: Two equivalent DFA’s 


Example 4.21: Consider the two DFA’s in Fig. 4.10. Each DFA accepts 
the empty string and all strings that end in 0; that is the language of regular 
expression € + (0+ 1)*0. We can imagine that Fig. 4.10 represents a single 
DFA, with five states A through E. If we apply the table-filling algorithm to 
that automaton, the result is as shown in Fig. 4.11. 


moans 
= 


x x |x 


A B C D 


Figure 4.11: The table of distinguishabilities for Fig. 4.10 
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To see how the table is filled out, we start by placing z’s in all pairs of 
states where exactly one of the states is accepting. It turns out that there is 
no more to do. The four remaining pairs, {4, C}, {A, D}, {C, D}, and {B, E} 
are all equivalent pairs. You should check that no more distinguishable pairs 
are discovered in the inductive part of the table-filling algorithm. For instance, 
with the table as in Fig. 4.11, we cannot distinguish the pair {A, D} because 
on 0 they go to themselves, and on 1 they go to the pair {B, E}, which has 
not yet been distinguished. Since A and C are found equivalent by this test, 
and those states were the start states of the two original automata, we conclude 
that these DFA’s do accept the same language. 


The time to fill out the table, and thus to decide whether two states are 
equivalent is polynomial in the number of states. If there are n states, then 
there are (3), or n(n — 1)/2 pairs of states. In one round, we consider all pairs 
of states, to see if one of their successor pairs has been found distinguishable, 
so a round surely takes no more than O(n?) time. Moreover, if on some round, 
no additional x’s are placed in the table, then the algorithm ends. Thus, there 
can be no more than O(n”) rounds, and O(n‘) is surely an upper bound on the 
running time of the table-filling algorithm. 

However, a more careful algorithm can fill the table in O(n?) time. The 
idea is to initialize, for each pair of states {r,s}, a list of those pairs {p,q} that 
“depend on” {r,s}. That is, if {r,s} is found distinguishable, then {p,q} is 
distinguishable. We create the lists initially by examining each pair of states 
{p,q}, and for each of the fixed number of input symbols a, we put {p,q} on 
the list for the pair of states {6(p, a), ô(q,a)}, which are the successor states for 
p and q on input a. 

If we ever find {r,s} to be distinguishable, then we go down the list for 
{r,s}. For each pair on that list that is not already distinguishable, we make 
that pair distinguishable, and we put the pair on a queue of pairs whose lists 
we must check similarly. 

The total work of this algorithm is proportional to the sum of the lengths 
of the lists, since we are at all times either adding something to the lists (ini- 
tialization) or examining a member of the list for the first and last time (when 
we go down the list for a pair that has been found distinguishable). Since the 
size of the input alphabet is considered a constant, each pair of states is put on 
O(1) lists. As there are O(n?) pairs, the total work is O(n?). 


4.4.3 Minimization of DFA’s 


Another important consequence of the test for equivalence of states is that we 
can “minimize” DFA’s. That is, for each DFA we can find an equivalent DFA 
that has as few states as any DFA accepting the same language. Moreover, 
except for our ability to call the states by whatever names we choose, this 
minimum-state DFA is unique for the language. The algorithm is as follows: 


1. First, eliminate any state that cannot be reached from the start state. 
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2. Then, partition the remaining states into blocks, so that all states in the 
same block are equivalent, and no pair of states from different blocks are 
equivalent. Theorem 4.24, below, shows that we can always make such a 
partition. 


Example 4.22: Consider the table of Fig. 4.9, where we determined the state 
equivalences and distinguishabilities for the states of Fig. 4.8. The partition 
of the states into equivalent blocks is ({4, E}, {B, H}, {C}, {D, F}, {G}). 
Notice that the three pairs of states that are equivalent are each placed in a 
block together, while the states that are distinguishable from all the other states 
are each in a block alone. 

For the automaton of Fig. 4.10, the partition is ({A,C,D}, {B,E}). This 
example shows that we can have more than two states in a block. It may 
appear fortuitous that A, C, and D can all live together in a block, because 
every pair of them is equivalent, and none of them is equivalent to any other 
state. However, as we shall see in the next theorem to be proved, this situation 
is guaranteed by our definition of “equivalence” for states. 


Theorem 4.23: The equivalence of states is transitive. That is, if in some 
DFA A = (Q, £, ô, qo, F) we find that states p and q are equivalent, and we also 
find that q and r are equivalent, then it must be that p and r are equivalent. 


PROOF: Note that transitivity is a property we expect of any relationship called 
“equivalence.” However, simply calling something “equivalence” doesn’t make 
it transitive; we must prove that the name is justified. 

Suppose that the pairs {p,q} and {q,r} are equivalent, but pair {p,r} is 
distinguishable. Then there is some input string w such that exactly one of 
4(p, w) and d(r,w) is an accepting state. Suppose, by symmetry, that 4(p, w) 
is the accepting state. 

Now consider whether ô(q, w) is accepting or not. If it is accepting, then 
{q,r} is distinguishable, since 4(q,w) is accepting, and d(r,w) is not. If 4(q,w) 
is nonaccepting, then {p, q} is distinguishable for a similar reason. We conclude 
by contradiction that {p,r} was not distinguishable, and therefore this pair is 
equivalent. 


We can use Theorem 4.23 to justify the obvious algorithm for partitioning 
states. For each state q, construct a block that consists of q and all the states 
that are equivalent to g. We must show that the resulting blocks are a partition; 
that is, no state is in two distinct blocks. 

First, observe that all states in any block are mutually equivalent. That is, 
if p and r are two states in the block of states equivalent to q, then p and r are 
equivalent to each other, by Theorem 4.23. 

Suppose that there are two overlapping, but not identical blocks. That 
is, there is a block B that includes states p and q, and another block C that 
includes p but not q. Since p and q are in a block together, they are equivalent. 
Consider how the block C was formed. If it was the block generated by p, then 
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q would be in C, because those states are equivalent. Thus, it must be that 
there is some third state s that generated block C; i.e., C is the set of states 
equivalent to s. 

We know that pis equivalent to s, because pis in block C. We also know that 
p is equivalent to q because they are together in block B. By the transitivity of 
Theorem 4.23, q is equivalent to s. But then q belongs in block C, a contradic- 
tion. We conclude that equivalence of states partitions the states; that is, two 
states either have the same set of equivalent states (including themselves), or 
their equivalent states are disjoint. To conclude the above analysis: 


Theorem 4.24: If we create for each state g of a DFA a block consisting of 
q and all the states equivalent to q, then the different blocks of states form a 
partition of the set of states.” That is, each state is in exactly one block. All 
members of a block are equivalent, and no pair of states chosen from different 
blocks are equivalent. 


We are now able to state succinctly the algorithm for minimizing a DFA 
A= (Q, £, ô, qo, F). 


1. Use the table-filling algorithm to find all the pairs of equivalent states. 


2. Partition the set of states Q into blocks of mutually equivalent states by 
the method described above. 


3. Construct the minimum-state equivalent DFA B by using the blocks as 
its states. Let y be the transition function of B. Suppose S is a set of 
equivalent states of A, and a is an input symbol. Then there must exist one 
block T of states such that for all states q in S, ô(q,a) is a member of block 
T. For if not, then input symbol a takes two states p and q of S to states 
in different blocks, and those states are distinguishable by Theorem 4.24. 
That fact lets us conclude that p and q are not equivalent, and they did 
not both belong in S. As a consequence, we can let y(S,a) = T. In 
addition: 


(a) The start state of B is the block containing the start state of A. 


(b) The set of accepting states of B is the set of blocks containing ac- 
cepting states of A. Note that if one state of a block is accepting, 
then all the states of that block must be accepting. The reason is 
that any accepting state is distinguishable from any nonaccepting 
state, so you can’t have both accepting and nonaccepting states in 
one block of equivalent states. 


5You should remember that the same block may be formed several times, starting from 
different states. However, the partition consists of the different blocks, so this block appears 
only once in the partition. 
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Figure 4.12: Minimum-state DFA equivalent to Fig. 4.8 


Example 4.25: Let us minimize the DFA from Fig. 4.8. We established the 
blocks of the state partition in Example 4.22. Figure 4.12 shows the minimum- 
state automaton. Its five states correspond to the five blocks of equivalent states 
for the automaton of Fig. 4.8. 

The start state is {A, E}, since A was the start state of Fig. 4.8. The only 
accepting state is {C}, since C is the only accepting state of Fig. 4.8. Notice 
that the transitions of Fig. 4.12 properly reflect the transitions of Fig. 4.8. For 
instance, Fig. 4.12 has a transition on input 0 from {A, E} to {B,H}. That 
makes sense, because in Fig. 4.8, A goes to B on input 0, and E goes to H. 
Likewise, on input 1, {A, E} goes to {D, F}. If we examine Fig. 4.8, we find 
that both A and E go to F on input 1, so the selection of the successor of 
{A, E} on input 1 is also correct. Note that the fact neither A nor E goes to 
D on input 1 is not important. You may check that all of the other transitions 
are also proper. 


4.4.4 Why the Minimized DFA Can’t Be Beaten 


Suppose we have a DFA A, and we minimize it to construct a DFA M, using the 
partitioning method of Theorem 4.24. That theorem shows that we can’t group 
the states of A into fewer groups and still have an equivalent DFA. However, 
could there be another DFA N, unrelated to A, that accepts the same language 
as A and M, yet has fewer states than M? We can prove by contradiction that 
N does not exist. 

First, run the state-distinguishability process of Section 4.4.1 on the states of 
M and N together, as if they were one DFA. We may assume that the states of 
M and N have no names in common, so the transition function of the combined 
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Minimizing the States of an NFA 


You might imagine that the same state-partition technique that minimizes 
the states of a DFA could also be used to find a minimum-state NFA 
equivalent to a given NFA or DFA. While we can, by a process of exhaustive 
enumeration, find an NFA with as few states as possible accepting a given 
regular language, we cannot simply group the states of some given NFA 
for the language. 

An example is in Fig. 4.13. None of the three states are equivalent. 
Surely accepting state B is distinguishable from nonaccepting states A and 
C. However, A and C are distinguishable by input 0. The successors of C 
are A alone, which does not include an accepting state, while the successors 
of A are {A, B}, which does include an accepting state. Thus, grouping 
equivalent states does not reduce the number of states of Fig. 4.13. 

However, we can find a smaller NFA for the same language if we 
simply remove state C. Note that A and B alone accept all strings ending 
in 0, while adding state C does not allow us to accept any other strings. 


Figure 4.13: An NFA that cannot be minimized by state equivalence 


automaton is the union of the transition rules of M and N, with no interaction. 
States are accepting in the combined DFA if and only if they are accepting in 
the DFA from which they come. 

The start states of M and N are indistinguishable because L(M) = L(N). 
Further, if {p,q} are indistinguishable, then their successors on any one input 
symbol are also indistinguishable. The reason is that if we could distinguish 
the successors, then we could distinguish p from q. 

Neither M nor N could have an inaccessible state, or else we could eliminate 
that state and have an even smaller DFA for the same language. Thus, every 
state of M is indistinguishable from at least one state of N. To see why, suppose 
pis a state of M. Then there is some string a,a2:--a, that takes the start 
state of M to state p. This string also takes the start state of N to some state 
q. Since we know the start states are indistinguishable, we also know that their 
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successors under input symbol a, are indistinguishable. Then, the successors 
of those states on input az are indistinguishable, and so on, until we conclude 
that p and q are indistinguishable. 

Since N has fewer states than M, there are two states of M that are in- 
distinguishable from the same state of N, and therefore indistinguishable from 
each other. But M was designed so that all its states are distinguishable from 
each other. We have a contradiction, so the assumption that N exists is wrong, 
and M in fact has as few states as any equivalent DFA for A. Formally, we 
have proved: 


Theorem 4.26: If A is a DFA, and M the DFA constructed from A by the 
algorithm described in the statement of Theorem 4.24, then M has as few states 
as any DFA equivalent to A. 


In fact we can say something even stronger than Theorem 4.26. There must 
be a one-to-one correspondence between the states of any other minimum-state 
N and the DFA M. The reason is that we argued above how each state of M 
must be equivalent to one state of N, and no state of M can be equivalent to 
two states of N. We can similarly argue that no state of N can be equivalent 
to two states of M, although each state of N must be equivalent to one of M’s 
states. Thus, the minimum-state DFA equivalent to A is unique except for a 
possible renaming of the states. 
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Figure 4.14: A DFA to be minimized 


4.4.5 Exercises for Section 4.4 
* Exercise 4.4.1: In Fig. 4.14 is the transition table of a DFA. 
a) Draw the table of distinguishabilities for this automaton. 
b) Construct the minimum-state equivalent DFA. 


Exercise 4.4.2: Repeat Exercise 4.4.1 for the DFA of Fig 4.15. 
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Figure 4.15: Another DFA to minimize 


!! Exercise 4.4.3: Suppose that p and q are distinguishable states of a given 
DFA A with n states. As a function of n, what is the tightest upper bound on 
how long the shortest string that distinguishes p from q can be? 


4.5 Summary of Chapter 4 


+ The Pumping Lemma: If a language is regular, then every sufficiently long 
string in the language has a nonempty substring that can be “pumped,” 
that is, repeated any number of times while the resulting strings are also 
in the language. This fact can be used to prove that many different 
languages are not regular. 


+ Operations That Preserve the Property of Being a Regular Language: 
There are many operations that, when applied to regular languages, yield 
a regular language as a result. Among these are union, concatenation, clo- 
sure, intersection, complementation, difference, reversal, homomorphism 
(replacement of each symbol by an associated string), and inverse homo- 
morphism. 


+ Testing Emptiness of Regular Languages: There is an algorithm that, 
given a representation of a regular language, such as an automaton or 
regular expression, tells whether or not the represented language is the 
empty set. 


+ Testing Membership in a Regular Language: There is an algorithm that, 
given a string and a representation of a regular language, tells whether or 
not the string is in the language. 


+ Testing Distinguishability of States: Two states of a DFA are distinguish- 
able if there is an input string that takes exactly one of the two states to 
an accepting state. By starting with only the fact that pairs consisting 
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of one accepting and one nonaccepting state are distinguishable, and try- 
ing to discover additional pairs of distinguishable states by finding pairs 
whose successors on one input symbol are distinguishable, we can discover 
all pairs of distinguishable states. 


+ Minimizing Deterministic Finite Automata: We can partition the states 
of any DFA into groups of mutually indistinguishable states. Members of 
two different groups are always distinguishable. If we replace each group 
by asingle state, we get an equivalent DFA that has as few states as any 
DFA for the same language. 


4.6 Gradiance Problems for Chapter 4 


The following is a sample of problems that are available on-line through the 
Gradiance system at www.gradiance.com/pearson. Each of these problems 
is worked like conventional homework. The Gradiance system gives you four 
choices that sample your knowledge of the solution. If you make the wrong 
choice, you are given a hint or advice and encouraged to try the same problem 
again. 


Problem 4.1: Design the minimum-state DFA that accepts all and only the 
strings of 0’s and 1’s that end in 010. To verify that you have designed the 
correct automaton, we will ask you to identify the true statement in a list of 
choices. These choices will involve: 


1. The number of loops (transitions from a state to itself). 
2. The number of transitions into a state (including loops) on input 1. 


3. The number of transitions into a state (including loops) on input 0. 


Count the number of transitions into each of your states (”in-transitions”) on 
input 1 and also on input 0. Count the number of loops on input 1 and on 
input 0. Then, find the true statement in the following list. 


Problem 4.2: Here is the transition table of a DFA [shown on-line by the 
Gradiance system]. Find the minimum-state DFA equivalent to the above. 
Then, identify in the list below the pair of equivalent states (states that get 
merged in the minimization process. 


Problem 4.3: Here is the transition table of a DFA that we shall call M [shown 
on-line by the Gradiance system]. Find the minimum-state DFA equivalent to 
the above. States in the minimum-state DFA are each the merger of some of 
the states of M. Find in the list below a set of states of M that forms one state 
of the minimum-state DFA. 
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Problem 4.4: The language of regular expression (0 + 10)* is the set of all 
strings of 0’s and 1’s such that every 1 is immediately followed by a 0. Describe 
the complement of this language (with respect to the alphabet {0, 1}) and iden- 
tify in the list below the regular expression whose language is the complement 
of L((0 + 10)*). 


Problem 4.5: The homomorphism h is defined by h(a) = 01 and h(b) = 10. 
What is h(X)? [X is a string that will be provided by the Gradiance system]. 


Problem 4.6: If h is the homomorphism defined by h(a) = 0 and h(b) = e, 
which of the following strings is in A™+ (000)? 


Problem 4.7: Let h be the homomorphism defined by h(a) = 01, h(b) = 10, 
h(c) = 0, and h(d) = 1. If we take any string w in (0 + 1)*, h~!(w) contains 
some number of strings, N(w). For example, h~'(1100) = {ddcc, dbc}; i.e., 
N(1100) = 2. We can calculate the number of strings in h~!(w) by a recursion 
on the length of w. For example, if w = 00x for some string x, then N(w) = 
N (Ox), since the first 0 in w can only be produced from c, not from a. Complete 
the reasoning necessary to compute N(w) for any string w in (0 + 1)*. Then, 
choose the correct value of N(X) [X is a value that will be provided by the 
Gradiance system]. 


Problem 4.8: The operation DM (L) is defined as follows: 
1. Throw away every even-length string from L. 
2. For each odd-length string, remove the middle character. 


For example, if L = {001, 1100, 10101}, then DM(L) = {01,1001}. That is, 
even-length string 1100 is deleted, the middle character of 001 is removed to 
make 01, and the middle character of 10101 is removed to make 1001. It turns 
out that if L is a regular language, DM(L) may or may not be regular. For 
each of the following languages L, determine what DM (ZL) is, and tell whether 
or not it is regular. 


1. Lı: the language of regular expression (01)*0. 
2. Lə: the language of regular expression (0 + 1)*1(0 + 1)*. 
3. L3: the language of regular expression (101)*. 
4. L4: the language of regular expression 00*11*. 
Now, identify the true statement below. 


Problem 4.9: Find, in the list below, a regular expression whose language is 
the reversal of the language of this regular expression. [The regular expression 
will be provided by the Gradiance system] 


Problem 4.10: If h(a) = 01, h(b) = 0, and h(c) = 10, which of the following 
strings is in h—+(010010)? 
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4.7 References for Chapter 4 


Except for the obvious closure properties of regular expressions — union, con- 
catenation, and star — that were shown by Kleene [6], almost all results about 
closure properties of the regular languages mimic similar results about context- 
free languages (the class of languages we study in the next chapters). Thus, 
the pumping lemma for regular languages is a simplification of a correspond- 
ing result for context-free languages by Bar-Hillel, Perles, and Shamir [1]. The 
same paper indirectly gives us several of the other closure properties shown 
here. However, the closure under inverse homomorphism is from [2]. 

The quotient operation introduced in Exercise 4.2.2 is from [3]. In fact, that 
paper talks about a more general operation where in place of a single symbol a 
is any regular language. The series of operations of the “partial removal” type, 
starting with Exercise 4.2.8 on the first halves of strings in a regular language, 
began with [8]. Seiferas and McNaughton [9] worked out the general case of 
when a removal operation preserves regular languages. 

The original decision algorithms, such as emptiness, finiteness, and member- 
ship for regular languages, are from [7]. Algorithms for minimizing the states 
of a DFA appear there and in [5]. The most efficient algorithm for finding the 
minimum-state DFA is in [4]. 


1. Y. Bar-Hillel, M. Perles, and E. Shamir, “On formal properties of simple 
phrase-structure grammars,” Z. Phonetik. Sprachwiss. Kommunikations- 
forsch. 14 (1961), pp. 143-172. 


2. S. Ginsburg and G. Rose, “Operations which preserve definability in lan- 
guages,” J. ACM 10:2 (1963), pp. 175-195. 


3. S. Ginsburg and E. H. Spanier, “Quotients of context-free languages,” J. 
ACM 10:4 (1963), pp. 487-492. 


4. J. E. Hopcroft, “An nlogn algorithm for minimizing the states in a finite 
automaton,” in Z. Kohavi (ed.) The Theory of Machines and Computa- 
tions, Academic Press, New York, 1971, pp. 189-196. 


5. D. A. Huffman, “The synthesis of sequential switching circuits,” J. Frank- 
lin Inst. 257:3-4 (1954), pp. 161-190 and 275-303. 


6. S.C. Kleene, “Representation of events in nerve nets and finite automata,” 
in C. E. Shannon and J. McCarthy, Automata Studies, Princeton Univ. 
Press, 1956, pp. 3-42. 


7. E. F. Moore, “Gedanken experiments on sequential machines,” in C. E. 
Shannon and J. McCarthy, Automata Studies, Princeton Univ. Press, 
1956, pp. 129-153. 


8. R. E. Stearns and J. Hartmanis, “Regularity-preserving modifications of 
regular expressions,” Information and Control 6:1 (1963), pp. 55-69. 


170 CHAPTER 4. PROPERTIES OF REGULAR LANGUAGES 


9. J. I. Seiferas and R. McNaughton, “Regularity-preserving modifications,” 
Theoretical Computer Science 2:2 (1976), pp. 147-154. 


Chapter 5 


Context-Free Grammars 
and Languages 


We now turn our attention away from the regular languages to a larger class of 
languages, called the “context-free languages.” These languages have a natu- 
ral, recursive notation, called “context-free grammars.” Context-free grammars 
have played a central role in compiler technology since the 1960’s; they turned 
the implementation of parsers (functions that discover the structure of a pro- 
gram) from a time-consuming, ad-hoc implementation task into a routine job 
that can be done in an afternoon. More recently, the context-free grammar has 
been used to describe document formats, via the so-called document-type defi- 
nition (DTD) that is used in the XML (extensible markup language) community 
for information exchange on the Web. 

In this chapter, we introduce the context-free grammar notation, and show 
how grammars define languages. We discuss the “parse tree,” a picture of the 
structure that a grammar places on the strings of its language. The parse tree 
is the product of a parser for a programming language and is the way that the 
structure of programs is normally captured. 

There is an automaton-like notation, called the “pushdown automaton,” 
that also describes all and only the context-free languages; we introduce the 
pushdown automaton in Chapter 6. While less important than finite automata, 
we shall find the pushdown automaton, especially its equivalence to context-free 
grammars as a language-defining mechanism, to be quite useful when we explore 
the closure and decision properties of the context-free languages in Chapter 7. 


5.1 Context-Free Grammars 
We shall begin by introducing the context-free grammar notation informally. 
After seeing some of the important capabilities of these grammars, we offer 


formal definitions. We show how to define a grammar formally, and introduce 
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the process of “derivation,” whereby it is determined which strings are in the 
language of the grammar. 


5.1.1 An Informal Example 


Let us consider the language of palindromes. A palindrome is a string that reads 
the same forward and backward, such as otto or madamimadam (“Madam, I’m 
Adam,” allegedly the first thing Eve heard in the Garden of Eden). Put another 
way, string w is a palindrome if and only if w = w?. To make things simple, 
we shall consider describing only the palindromes with alphabet {0,1}. This 
language includes strings like 0110, 11011, and €, but not 011 or 0101. 

It is easy to verify that the language Ly. of palindromes of 0’s and 1’s is 
not a regular language. To do so, we use the pumping lemma. If Lpai is a 
regular language, let n be the associated constant, and consider the palindrome 
w = 0°10”. If Lpa is regular, then we can break w into w = xyz, such that 
y consists of one or more 0’s from the first group. Thus, +z, which would also 
have to be in Lpai if Lpa were regular, would have fewer 0’s to the left of the 
lone 1 than there are to the right of the 1. Therefore xz cannot be a palindrome. 
We have now contradicted the assumption that Lpar is a regular language. 

There is a natural, recursive definition of when a string of 0’s and 1’s is in 
Lya. It starts with a basis saying that a few obvious strings are in Lpai, and 
then exploits the idea that if a string is a palindrome, it must begin and end 
with the same symbol. Further, when the first and last symbols are removed, 
the resulting string must also be a palindrome. That is: 


BASIS: ¢, 0, and 1 are palindromes. 


INDUCTION: If w is a palindrome, so are Ow0 and 1w1. No string is a palin- 
drome of 0’s and 1’s, unless it follows from this basis and induction rule. 


A context-free grammar is a formal notation for expressing such recursive 
definitions of languages. A grammar consists of one or more variables that 
represent classes of strings, i.e., languages. In this example we have need for 
only one variable P, which represents the set of palindromes; that is the class of 
strings forming the language Ly. There are rules that say how the strings in 
each class are constructed. The construction can use symbols of the alphabet, 
strings that are already known to be in one of the classes, or both. 


Example 5.1: The rules that define the palindromes, expressed in the context- 
free grammar notation, are shown in Fig. 5.1. We shall see in Section 5.1.2 what 
the rules mean. 

The first three rules form the basis. They tell us that the class of palindromes 
includes the strings €, 0, and 1. None of the right sides of these rules (the 
portions following the arrows) contains a variable, which is why they form a 
basis for the definition. 

The last two rules form the inductive part of the definition. For instance, 
rule 4 says that if we take any string w from the class P, then Ow0 is also in 
class P. Rule 5 likewise tells us that 1w1 is also in P. 
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1. P > e 
2 P > 0 
3. P > 1 
4. P => OPO 
5. P = 1P1 


Figure 5.1: A context-free grammar for palindromes 


5.1.2 Definition of Context-Free Grammars 


There are four important components in a grammatical description of a lan- 
guage: 


1. There is a finite set of symbols that form the strings of the language being 
defined. This set was {0,1} in the palindrome example we just saw. We 
call this alphabet the terminals, or terminal symbols. 


2. There is a finite set of variables, also called sometimes nonterminals or 
syntactic categories. Each variable represents a language; i.e., a set of 
strings. In our example above, there was only one variable, P, which we 
used to represent the class of palindromes over alphabet {0, 1}. 


3. One of the variables represents the language being defined; it is called the 
start symbol. Other variables represent auxiliary classes of strings that 
are used to help define the language of the start symbol. In our example, 
P, the only variable, is the start symbol. 


4. There is a finite set of productions or rules that represent the recursive 
definition of a language. Each production consists of: 


(a) A variable that is being (partially) defined by the production. This 
variable is often called the head of the production. 


(b) The production symbol —. 


(c) A string of zero or more terminals and variables. This string, called 
the body of the production, represents one way to form strings in the 
language of the variable of the head. In so doing, we leave terminals 
unchanged and substitute for each variable of the body any string 
that is known to be in the language of that variable. 


We saw an example of productions in Fig. 5.1. 


The four components just described form a context-free grammar, or just gram- 
mar, or CFG. We shall represent a CFG G by its four components, that is, 
G = (V,T, P, S), where V is the set of variables, T the terminals, P the set of 
productions, and S the start symbol. 
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Example 5.2: The grammar Gq, for the palindromes is represented by 


Gpat = ({P}, {0,1}, A, P) 


where A represents the set of five productions that we saw in Fig. 5.1. 


Example 5.3: Let us explore a more complex CFG that represents (a simplific- 
ation of) expressions in a typical programming language. First, we shall limit 
ourselves to the operators + and *, representing addition and multiplication. 
We shall allow arguments to be identifiers, but instead of allowing the full set of 
typical identifiers (letters followed by zero or more letters and digits), we shall 
allow only the letters a and b and the digits 0 and 1. Every identifier must 
begin with a or b, which may be followed by any string in {a,b,0,1}*. 

We need two variables in this grammar. One, which we call E, represents 
expressions. It is the start symbol and represents the language of expressions 
we are defining. The other variable, I, represents identifiers. Its language is 
actually regular; it is the language of the regular expression 


(a+b)(a+b+0 +1)" 


However, we shall not use regular expressions directly in grammars. Rather, 
we use a set of productions that say essentially the same thing as this regular 
expression. 


wn 
Bae 
4444 


I0 
I1 


E Or NEON 


>D 
SAN NNN 


L44444 


Figure 5.2: A context-free grammar for simple expressions 


The grammar for expressions is stated formally as G = ({E,1},T,P, E), 
where T is the set of symbols {+, x, (, ),a,b, 0, 1} and P is the set of productions 
shown in Fig. 5.2. We interpret the productions as follows. 

Rule (1) is the basis rule for expressions. It says that an expression can 
be a single identifier. Rules (2) through (4) describe the inductive case for 
expressions. Rule (2) says that an expression can be two expressions connected 
by a plus sign; rule (3) says the same with a multiplication sign. Rule (4) says 
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Compact Notation for Productions 


It is convenient to think of a production as “belonging” to the variable 
of its head. We shall often use remarks like “the productions for A” or 
“ A-productions” to refer to the productions whose head is variable A. We 
may write the productions for a grammar by listing each variable once, and 
then listing all the bodies of the productions for that variable, separated by 
vertical bars. That is, the productions A > a,, A > ag,...,A4 —> Qn can 
be replaced by the notation A > a; |a2|---|a,. For instance, the grammar 
for palindromes from Fig. 5.1 can be written as P > e | O | 1 | OPO | 1P1. 


that if we take any expression and put matching parentheses around it, the 
result is also an expression. 

Rules (5) through (10) describe identifiers 7. The basis is rules (5) and (6); 
they say that a and b are identifiers. The remaining four rules are the inductive 
case. They say that if we have any identifier, we can follow it by a, b, 0, or 1, 
and the result will be another identifier. 


5.1.3 Derivations Using a Grammar 


We apply the productions of a CFG to infer that certain strings are in the 
language of a certain variable. There are two approaches to this inference. The 
more conventional approach is to use the rules from body to head. That is, we 
take strings known to be in the language of each of the variables of the body, 
concatenate them, in the proper order, with any terminals appearing in the 
body, and infer that the resulting string is in the language of the variable in 
the head. We shall refer to this procedure as recursive inference. 

There is another approach to defining the language of a grammar, in which 
we use the productions from head to body. We expand the start symbol using 
one of its productions (i.e., using a production whose head is the start symbol). 
We further expand the resulting string by replacing one of the variables by the 
body of one of its productions, and so on, until we derive a string consisting 
entirely of terminals. The language of the grammar is all strings of terminals 
that we can obtain in this way. This use of grammars is called derivation. 

We shall begin with an example of the first approach — recursive inference. 
However, it is often more natural to think of grammars as used in derivations, 
and we shall next develop the notation for describing these derivations. 


Example 5.4: Let us consider some of the inferences we can make using the 
grammar for expressions in Fig. 5.2. Figure 5.3 summarizes these inferences. 
For example, line (i) says that we can infer string a is in the language for 
I by using production 5. Lines (ii) through (iv) say we can infer that 600 
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is an identifier by using production 6 once (to get the b) and then applying 
production 9 twice (to attach the two 0’s). 


String For lang- | Production | String(s) 
Inferred uage of used used 
(i) | a T 5 = 
(ii) | b I 6 = 
(iii) | b0 I 9 (ii) 
(iv) | b00 I 9 (iii) 
(w) | a E 1 (i) 
(vi) | b00 E 1 (iv) 
(vii) | a+ b00 E 2 (v), (vi) 
(viii) | (a + b00) E 4 (vii) 
(iz) | ax(a+b00) | E 3 (v), (viii) 


Figure 5.3: Inferring strings using the grammar of Fig. 5.2 


Lines (v) and (vi) exploit production 1 to infer that, since any identifier is 
an expression, the strings a and b00, which we inferred in lines (i) and (iv) to 
be identifiers, are also in the language of variable E. Line (vii) uses produc- 
tion 2 to infer that the sum of these identifiers is an expression; line (viii) uses 
production 4 to infer that the same string with parentheses around it is also an 
expression, and line (ix) uses production 3 to multiply the identifier a by the 
expression we had discovered in line (viii). 


The process of deriving strings by applying productions from head to body 
requires the definition of a new relation symbol >. Suppose G = (V, T, P, S) is 
a CFG. Let aA be a string of terminals and variables, with A a variable. That 
is, œ and @ are strings in (V UT)*, and Aisin V. Let A > y be a production 
of G. Then we say aA8 > avy. If G is understood, we just say aAB > ayp. 
Notice that one derivation step replaces any variable anywhere in the string by 
the body of one of its productions. 

We may extend the > relationship to represent zero, one, or many derivation 
steps, much as the transition function 6 of a finite automaton was extended to 
ô. For derivations, we use a * to denote “zero or more steps,” as follows: 


BASIS: For any string a of terminals and variables, we say a > a. That is, 
any string derives itself. 


INDUCTION: If a => Band 8 > y, then a 5 y. That is, if a can become 3 
G 


by zero or more steps, and one more step takes 8 to y, then a can become y. 
* . : 
Put another way, a => p means that there is a sequence of strings 71, Y2,- - <, Yn; 


for some n > 1, such that 


l. a=, 
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2. B =n, and 
3. For i = 1,2,...,n — 1, we have yi > yi41. 


If grammar G is understood, then we use Š in place of Š. 
G 


Example 5.5: The inference that ax» (a + 000) is in the language of variable E 
can be reflected in a derivation of that string, starting with the string E. Here 
is one such derivation: 


E> ExE Ix E ax E 


ax(E)=>ax(E+E)=>ax(I+E)=>ax(a+ E) > 
ax (a+ I) = ax (a + I0) > a x (a + 100) > ax (a + b00) 


At the first step, E is replaced by the body of production 3 (from Fig. 5.2). 
At the second step, production 1 is used to replace the first E by T, and so 
on. Notice that we have systematically adopted the policy of always replacing 
the leftmost variable in the string. However, at each step we may choose which 
variable to replace, and we can use any of the productions for that variable. 
For instance, at the second step, we could have replaced the second E by (E), 
using production 4. In that case, we would say Ex E > E x (E). We could also 
have chosen to make a replacement that would fail to lead to the same string 
of terminals. A simple example would be if we used production 2 at the first 
step, and said E > E + E. No replacements for the two E’s could ever turn 
E + E into ax (a + b00). 

We can use the > relationship to condense the derivation. We know Æ SE 
by the basis. Repeated use of the inductive part gives us E 5 ExE, E 5 TxE, 
and so on, until finally E Š a » (a+ b00). 

The two viewpoints — recursive inference and derivation — are equivalent. 
That is, a string of terminals w is inferred to be in the language of some variable 
A if and only if A Š w. However, the proof of this fact requires some work, 
and we leave it to Section 5.2. 


5.1.4 Leftmost and Rightmost Derivations 


In order to restrict the number of choices we have in deriving a string, it is 
often useful to require that at each step we replace the leftmost variable by one 
of its production bodies. Such a derivation is called a leftmost derivation, and 
we indicate that a derivation is leftmost by using the relations > and = , for 


one or many steps, respectively. If the grammar G that is being ued iB not 
obvious, we can place the name G below the arrow in either of these symbols. 

Similarly, it is possible to require that at each step the rightmost variable 
is replaced by one of its bodies. If so, we call the derivation rightmost and use 
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Notation for CFG Derivations 


There are a number of conventions in common use that help us remember 
the role of the symbols we use when discussing CFG’s. Here are the 
conventions we shall use: 


1. Lower-case letters near the beginning of the alphabet, a, b, and so 
on, are terminal symbols. We shall also assume that digits and other 
characters such as + or parentheses are terminals. 


. Upper-case letters near the beginning of the alphabet, A, B, and so 
on, are variables. 


. Lower-case letters near the end of the alphabet, such as w or z, are 
strings of terminals. This convention reminds us that the terminals 
are analogous to the input symbols of an automaton. 


. Upper-case letters near the end of the alphabet, such as X or Y, are 
either terminals or variables. 


. Lower-case Greek letters, such as a and 8, are strings consisting of 
terminals and/or variables. 


There is no special notation for strings that consist of variables only, since 
this concept plays no important role. However, a string named & or an- 
other Greek letter might happen to have only variables. 


the symbols > and = to indicate one or many rightmost derivation steps, 


rm rm 
respectively. Again, the name of the grammar may appear below these symbols 
if it is not clear which grammar is being used. 


Example 5.6: The derivation of Example 5.5 was actually a leftmost deriva- 
tion. Thus, we can describe the same derivation by: 


E> Be E> [le b> axE> 
lm lm lm 


lm 


ax (E) > ax(E+ E) => ax*(I+E)> a*(a+ E) > 


lm lm lm 


ax(a+I) => ax(at+J10)=> ax*(a+JI00) > ax (a+ 000) 


lm lm lm 


We can also summarize the leftmost derivation by saying E = a x (a + b00), or 
m 


express several steps of the derivation by expressions such as E x E = ax (E). 
m 
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There is a rightmost derivation that uses the same replacements for each 
variable, although it makes the replacements in different order. This rightmost 
derivation is: 


E> ExE=> Ex(E)=> Ex(E+ E) > 
rm rm rm 


rm 


Ex(E+I)=> Ex(E+10)> Ex(E +100) => Ex(E+600) => 
rm rm rm 


rm 


E x (I + b00) > E x (a+ b00) > Ix (a+ b00) > ax (a + b00) 


This derivation allows us to conclude E Š a x (a + b00). 
rm 


Any derivation has an equivalent leftmost and an equivalent rightmost der- 
7 * A 
ivation. That is, if w is a terminal string, and A a variable, then A > w if and 
5 * 
only if A Š w,and A> w if and only if A => w. We shall also prove these 
rm 


. . lm . 
claims in Section 5.2. 


5.1.5 The Language of a Grammar 


If G = (V,T, P, S) is a CFG, the language of G, denoted L(G), is the set of 
terminal strings that have derivations from the start symbol. That is, 


L(G) = {win T*| S$ w} 


If a language L is the language of some context-free grammar, then L is said to 
be a context-free language, or CFL. For instance, we asserted that the grammar 
of Fig. 5.1 defined the language of palindromes over alphabet {0,1}. Thus, the 
set of palindromes is a context-free language. We can prove that statement, as 
follows. 


Theorem 5.7: L(Gpat), where Gpq is the grammar of Example 5.1, is the set 
of palindromes over {0, 1}. 


PROOF: We shall prove that a string w in {0,1}* is in L(Gpq,) if and only if it 
is a palindrome; i.e., w = w®. 


(If) Suppose w is a palindrome. We show by induction on |w| that w is in 
L(G pat). 


BASIS: We use lengths 0 and 1 as the basis. If |w| = 0 or |w| = 1, then w is €, 
0, or 1. Since there are productions P > €, P > 0, and P —> 1, we conclude 
that P > w in any of these basis cases. 


INDUCTION: Suppose |w| > 2. Since w = w®, w must begin and end with the 
same symbol That is, w = 020 or w = 1x1. Moreover, x must be a palindrome; 
that is, x = 2". Note that we need the fact that |w| > 2 to infer that there are 
two distinct 0’s or 1’s, at either end of w. 
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If w = 0x0, then we invoke the inductive hypothesis to claim that P Š r. 
Then there is a derivation of w from P, namely P OPO Š 020 = w. If 
w = 1al1, the argument is the same, but we use the production P — 1P1 at 
the first step. In either case, we conclude that w is in L(Gyq) and complete 
the proof. 


(Only-if) Now, we assume that w is in L(Gpqi); that is, P Š w. We must 
conclude that w is a palindrome. The proof is an induction on the number of 
steps in a derivation of w from P. 


BASIS: If the derivation is one step, then it must use one of the three produc- 
tions that do not have P in the body. That is, the derivation is P > €, P > 0, 
or P => 1. Since e€, 0, and 1 are all palindromes, the basis is proven. 


INDUCTION: Now, suppose that the derivation takes n +1 steps, where n > 1, 
and the statement is true for all derivations of n steps. That is, if P Š rinn 
steps, then x is a palindrome. 

Consider an (n + 1)-step derivation of w, which must be of the form 


P > OPO 0x0 = w 


or P 1P1 Š 1zl w, since n + 1 steps is at least two steps, and the 


productions P + 0P0 and P > 1P1 are the only productions whose use allows 

additional steps of a derivation. Note that in either case, P Š rinn steps. 
By the inductive hypothesis, we know that z is a palindrome; that is, z = x”. 

But if so, then O00 and 121 are also palindromes. For instance, (020)" = 


0r?0 = 0x0. We conclude that w is a palindrome, which completes the proof. 


5.1.6 Sentential Forms 

Derivations from the start symbol produce strings that have a special role. We 

call these “sentential forms.” That is, if G = (V,T,P,S) is a CFG, then any 

string a in (V U T)* such that S Š a is a sentential form. If S = a, then 
m 

a is a left-sentential form, and if S 5 a, then a is a right-sentential form. 


rm 
Note that the language L(G) is those sentential forms that are in T*; i.e., they 
consist solely of terminals. 


Example 5.8: Consider the grammar for expressions from Fig. 5.2. For ex- 
ample, E » (I + E) is a sentential form, since there is a derivation 


E > Ex EP > Ex*(E) > Ex (E+E) > Ex(I+E) 


However this derivation is neither leftmost nor rightmost, since at the last step, 
the middle F is replaced. 

As an example of a left-sentential form, consider a x E, with the leftmost 
derivation 
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The Form of Proofs About Grammars 


Theorem 5.7 is typical of proofs that show a grammar defines a particular, 
informally defined language. We first develop an inductive hypothesis that 
states what properties the strings derived from each variable have. In this 
example, there was only one variable, P, so we had only to claim that its 
strings were palindromes. 

We prove the “if” part: that if a string w satisfies the informal state- 
ment about the strings of one of the variables A, then A = w. In our 
example, since P is the start symbol, we stated “P Š w” by saying that 
w is in the language of the grammar. Typically, we prove the “if” part by 
induction on the length of w. If there are k variables, then the inductive 
statement to be proved has k parts, which must be proved as a mutual 
induction. 

We must also prove the “only-if” part, that if A 5 w, then w sat- 
isfies the informal statement about the strings derived from variable A. 
Again, in our example, since we had to deal only with the start symbol 
P, we assumed that w was in the language of Gpa; as an equivalent to 
P Š w. The proof of this part is typically by induction on the number of 
steps in the derivation. If the grammar has productions that allow two or 
more variables to appear in derived strings, then we shall have to break 
a derivation of n steps into several parts, one derivation from each of the 
variables. These derivations may have fewer than n steps, so we have to 
perform an induction assuming the statement for all values n or less, as 
discussed in Section 1.4.2. 


E> ExE> le ED ark 
lm 


lm lm 


Additionally, the derivation 


E> ExE=> Ex(E)=> Ex«(E+E) 
rm rm rm 


shows that E x (E + E) is a right-sentential form. 


5.1.7 Exercises for Section 5.1 
Exercise 5.1.1: Design context-free grammars for the following languages: 


* a) The set {0"1” | n > 1}, that is, the set of all strings of one or more 0’s 
followed by an equal number of 1’s. 


*! b) The set {a’bic* |i Æ j or j # k}, that is, the set of strings of a’s followed 
by b’s followed by c’s, such that there are either a different number of a’s 
and b’s or a different number of b’s and c’s, or both. 


*] 
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!c) The set of all strings of a’s and b’s that are not of the form ww, that is, 
not equal to any string repeated. 


!! d) The set of all strings with twice as many 0’s as 1’s. 


Exercise 5.1.2: The following grammar generates the language of regular 
expression 0*1(0 + 1)*: 


S —> AIB 
A > OAle 
B > OB|1Ble 


Give leftmost and rightmost derivations of the following strings: 
* a) 00101. 

b) 1001. 

c) 00011. 


Exercise 5.1.3: Show that every regular language is a context-free language. 
Hint: Construct a CFG by induction on the number of operators in the regular 
expression. 


Exercise 5.1.4: A CFG is said to be right-linear if each production body 
has at most one variable, and that variable is at the right end. That is, all 
productions of a right-linear grammar are of the form A > wB or A > w, 
where A and B are variables and w some string of zero or more terminals. 


a) Show that every right-linear grammar generates a regular language. Hint: 
Construct an e-NFA that simulates leftmost derivations, using its state to 
represent the lone variable in the current left-sentential form. 


b) Show that every regular language has a right-linear grammar. Hint: Start 
with a DFA and let the variables of the grammar represent states. 


Exercise 5.1.5: Let T = {0,1,(,),+,*,0,e}. We may think of T as the set of 
symbols used by regular expressions over alphabet {0,1}; the only difference is 
that we use e for symbol e€, to avoid potential confusion in what follows. Your 
task is to design a CFG with set of terminals T that generates exactly the 
regular expressions with alphabet {0, 1}. 


Exercise 5.1.6: We defined the relation S with a basis “a > a” and an 
induction that says “a 5 Band 8 > y imply a 5 y. There are several other 
ways to define Š that also have the effect of saying that “Š igs zero or more 
=> steps.” Prove that the following are true: 


a) a Š 6 if and only if there is a sequence of one or more strings 
Y1 Y2; <- -3 Yn 


such that a = 71, 8 = Yn, and for i = 1,2,...,n — 1 we have yi > ŅYi+1- 
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b) Ifa 5 B, and 8 5 y, then a = y. Hint: use induction on the number 
of steps in the derivation 8 > y. 


! Exercise 5.1.7: Consider the CFG G defined by productions: 
S—>aS|Sbla|b 


a) Prove by induction on the string length that no string in L(G) has ba as 
a substring. 


b) Describe L(G) informally. Justify your answer using part (a). 
!! Exercise 5.1.8: Consider the CFG G defined by productions: 


S + aSbS | bSaS | € 


Prove that L(G) is the set of all strings with an equal number of a’s and 0’s. 


5.2 Parse Trees 


There is a tree representation for derivations that has proved extremely useful. 
This tree shows us clearly how the symbols of a terminal string are grouped 
into substrings, each of which belongs to the language of one of the variables of 
the grammar. But perhaps more importantly, the tree, known as a “parse tree” 
when used in a compiler, is the data structure of choice to represent the source 
program. In a compiler, the tree structure of the source program facilitates 
the translation of the source program into executable code by allowing natural, 
recursive functions to perform this translation process. 

In this section, we introduce the parse tree and show that the existence of 
parse trees is tied closely to the existence of derivations and recursive inferences. 
We shall later study the matter of ambiguity in grammars and languages, which 
is an important application of parse trees. Certain grammars allow a terminal 
string to have more than one parse tree. That situation makes the grammar 
unsuitable for a programming language, since the compiler could not tell the 
structure of certain source programs, and therefore could not with certainty 
deduce what the proper executable code for the program was. 


5.2.1 Constructing Parse Trees 


Let us fix on a grammar G = (V,T,P,S). The parse trees for G are trees with 
the following conditions: 


1. Each interior node is labeled by a variable in V. 


2. Each leaf is labeled by either a variable, a terminal, or e. However, if the 
leaf is labeled €, then it must be the only child of its parent. 
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Review of Tree Terminology 


We assume you have been introduced to the idea of a tree and are familiar 
with the commonly used definitions for trees. However, the following will 
serve as a review. 


e Trees are collections of nodes, with a parent-child relationship. A 
node has at most one parent, drawn above the node, and zero or 
more children, drawn below. Lines connect parents to their children. 
Figures 5.4, 5.5, and 5.6 are examples of trees. 


There is one node, the root, that has no parent; this node appears at 
the top of the tree. Nodes with no children are called leaves. Nodes 
that are not leaves are interior nodes. 


A child of a child of a --- node is a descendant of that node. A parent 
of a parent of a--- is an ancestor. Trivially, nodes are ancestors and 
descendants of themselves. 


The children of a node are ordered “from the left,” and drawn so. If 
node N is to the left of node M, then all the descendants of N are 
considered to be to the left of all the descendants of M. 


3. If an interior node is labeled A, and its children are labeled 
X1,Xo,. oe Xk 


respectively, from the left, then A > X,X9---Xy is a production in P. 
Note that the only time one of the X’s can be e€ is if that is the label of 
the only child, and A —> e€ is a production of G. 


Example 5.9: Figure 5.4 shows a parse tree that uses the expression grammar 
of Fig. 5.2. The root is labeled with the variable Æ. We see that the production 
used at the root is E > E + E, since the three children of the root have labels 
E, +, and E, respectively, from the left. At the leftmost child of the root, the 
production Æ — I is used, since there is one child of that node, labeled J. 


Example 5.10: Figure 5.5 shows a parse tree for the palindrome grammar of 
Fig. 5.1. The production used at the root is P => OPO, and at the middle child 
of the root it is P 4 1P1. Note that at the bottom is a use of the production 
P — e. That use, where the node labeled by the head has one child, labeled e€, 
is the only time that a node labeled € can appear in a parse tree. 
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Figure 5.4: A parse tree showing the derivation of I + E from E 


Figure 5.5: A parse tree showing the derivation P = 0110 


5.2.2 The Yield of a Parse Tree 


If we look at the leaves of any parse tree and concatenate them from the left, we 
get a string, called the yield of the tree, which is always a string that is derived 
from the root variable. The fact that the yield is derived from the root will be 
proved shortly. Of special importance are those parse trees such that: 


1. The yield is a terminal string. That is, all leaves are labeled either with 
a terminal or with e. 


2. The root is labeled by the start symbol. 


These are the parse trees whose yields are strings in the language of the under- 
lying grammar. We shall also prove shortly that another way to describe the 
language of a grammar is as the set of yields of those parse trees having the 
start symbol at the root and a terminal string as yield. 


Example 5.11: Figure 5.6 is an example of a tree with a terminal string as 
yield and the start symbol at the root; it is based on the grammar for expressions 
that we introduced in Fig. 5.2. This tree’s yield is the string a * (a + b00) that 
was derived in Example 5.5. In fact, as we shall see, this particular parse tree 
is a representation of that derivation. 


5.2.3 Inference, Derivations, and Parse Trees 


Each of the ideas that we have introduced so far for describing how a grammar 
works gives us essentially the same facts about strings. That is, given a grammar 
G = (V,T,P,S), we shall show that the following are equivalent: 
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Figure 5.6: Parse tree showing a * (a + b00) is in the language of our expression 
grammar 


1. The recursive inference procedure determines that terminal string w is in 
the language of variable A. 


2 A> w. 


3. A> w. 


lm 


4. A> w. 


rm 


5. There is a parse tree with root A and yield w. 


In fact, except for the use of recursive inference, which we only defined for 
terminal strings, all the other conditions — the existence of derivations, leftmost 
or rightmost derivations, and parse trees — are also equivalent if w is a string 
that has some variables. 

We need to prove these equivalences, and we do so using the plan of Fig. 5.7. 
That is, each arc in that diagram indicates that we prove a theorem that says 
if w meets the condition at the tail, then it meets the condition at the head of 
the arc. For instance, we shall show in Theorem 5.12 that if w is inferred to be 
in the language of A by recursive inference, then there is a parse tree with root 
A and yield w. 

Note that two of the arcs are very simple and will not be proved formally. If 
w has a leftmost derivation from A, then it surely has a derivation from A, since 
a leftmost derivation is a derivation. Likewise, if w has a rightmost derivation, 
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Figure 5.7: Proving the equivalence of certain statements about grammars 


then it surely has a derivation. We now proceed to prove the harder steps of 
this equivalence. 


5.2.4 From Inferences to Trees 


Theorem 5.12: Let G = (V,T,P,S) be a CFG. If the recursive inference 
procedure tells us that terminal string w is in the language of variable A, then 
there is a parse tree with root A and yield w. 


PROOF: The proof is an induction on the number of steps used to infer that w 
is in the language of A. 


BASIS: One step. Then only the basis of the inference procedure must have 
been used. Thus, there must be a production A — w. The tree of Fig. 5.8, 
where there is one leaf for each position of w, meets the conditions to be a parse 
tree for grammar G, and it evidently has yield w and root A. In the special 
case that w = €, the tree has a single leaf labeled € and is a legal parse tree 
with root A and yield w. 


w 


Figure 5.8: Tree constructed in the basis case of Theorem 5.12 


INDUCTION: Suppose that the fact w is in the language of A is inferred after 
n+ [1 inference steps, and that the statement of the theorem holds for all strings 
x and variables B such that the membership of x in the language of B was 
inferred using n or fewer inference steps. Consider the last step of the inference 
that w is in the language of A. This inference uses some production for A, say 
A —> XıXə --- Xk, where each X; is either a variable or a terminal. 

We can break w up as wiw +- wz, where: 


1. If X; is a terminal, then w; = X;; i.e., w; consists of only this one terminal 
from the production. 
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2. If X; is a variable, then w; is a string that was previously inferred to be in 
the language of X;. That is, this inference about w; took at most n of the 
n + 1 steps of the inference that w is in the language of A. It cannot take 
all n +1 steps, because the final step, using production A > X,X2--- Xk, 
is surely not part of the inference about w;. Consequently, we may apply 
the inductive hypothesis to w; and X;, and conclude that there is a parse 
tree with yield w; and root X;. 


AA A 


Figure 5.9: Tree used in the inductive part of the proof of Theorem 5.12 


We then construct a tree with root A and yield w, as suggested in Fig. 5.9. 
There is a root labeled A, whose children are X1, X2,...,X,. This choice is 
valid, since A + Xı Xə- -- Xk is a production of G. 

The node for each X; is made the root of a subtree with yield w;. In case (1), 
where X; is a terminal, this subtree is a trivial tree with a single node labeled 
Xi. That is, the subtree consists of only this child of the root. Since w; = X; 
in case (1), we meet the condition that the yield of the subtree is w;. 

In case (2), X; is a variable. Then, we invoke the inductive hypothesis to 
claim that there is some tree with root X; and yield w;. This tree is attached 
to the node for X; in Fig. 5.9. 

The tree so constructed has root A. Its yield is the yields of the subtrees, 
concatenated from left to right. That string is wyw2---w,, which is w. 


5.2.5 From Trees to Derivations 


We shall now show how to construct a leftmost derivation from a parse tree. 
The method for constructing a rightmost derivation uses the same ideas, and 
we shall not explore the rightmost-derivation case. In order to understand how 
derivations may be constructed, we need first to see how one derivation of a 
string from a variable can be embedded within another derivation. An example 
should illustrate the point. 


Example 5.13: Let us again consider the expression grammar of Fig. 5.2. It 
is easy to check that there is a derivation 


E > I > Ib > ab 
As a result, for any strings a and £8, it is also true that 


aEB => alB > albh => aabß 
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The justification is that we can make the same replacements of production 
bodies for heads in the context of a and 8 as we can in isolation.! 

For instance, if we have a derivation that begins E > E + E => E+ (E), 
we could apply the derivation of ab from the second E by treating “E + (” as 
a and “)” as 8. This derivation would then continue 


E + (E) => E + (I) > E + (Ib) > E + (ab) 


We are now able to prove a theorem that lets us convert a parse tree to a 
leftmost derivation. The proof is an induction on the height of the tree, which is 
the maximum length of a path that starts at the root, and proceeds downward 
through descendants, to a leaf. For instance, the height of the tree in Fig. 5.6 is 
7. The longest root-to-leaf path in this tree goes to the leaf labeled b. Note that 
path lengths conventionally count the edges, not the nodes, so a path consisting 
of a single node is of length 0. 


Theorem 5.14: Let G = (V,T, P, S) be a CFG, and suppose there is a parse 
tree with root labeled by variable A and with yield w, where w is in T*. Then 
there is a leftmost derivation A = w in grammar G. 

m 


PROOF: We perform an induction on the height of the tree. 


BASIS: The basis is height 1, the least that a parse tree with a yield of terminals 

can be. In this case, the tree must look like Fig. 5.8, with a root labeled A and 

children that read w, left-to-right. Since this tree is a parse tree, A > w must 

be a production. Thus, A 7 w is a one-step, leftmost derivation of w from A. 
m 


INDUCTION: If the height of the tree is n, where n > 1, it must look like 
Fig 5.9. That is, there is a root labeled A, with children labeled X1, X2,..., Xk 
from the left. The X’s may be either terminals or variables. 


1. If X; is a terminal, define w; to be the string consisting of X; alone. 


2. If X; is a variable, then it must be the root of some subtree with a yield 
of terminals, which we shall call w;. Note that in this case, the subtree is 
of height less than n, so the inductive hypothesis applies to it. That is, 
there is a leftmost derivation X; > Wi. 

m 


Note that w = wiw- wp. 


In fact, it is this property of being able to make a string-for-variable substitution regard- 
less of context that gave rise originally to the term “context-free.” There is a more powerful 
classes of grammars, called “context-sensitive,” where replacements are permitted only if cer- 
tain strings appear to the left and/or right. Context-sensitive grammars do not play a major 
role in practice today. 
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We construct a leftmost derivation of w as follows. We begin with the step 
A XıXə-- Xk. Then, for each i = 1,2,...,k, in order, we show that 
m 


* 
A a a wWiX igi Xiqe ++: Xk 
m 


This proof is actually another induction, this time on i. For the basis, i = 0, 
we already know that A = XıXə.-- Xk. For the induction, assume that 
m 


* 
A ae W1 We +++ Wi-1 AG X41 ++ XE 
m 


a) If X; is a terminal, do nothing. However, we shall subsequently think of 
X; as the terminal string w;. Thus, we already have 


* 
A po Oia WiX igi Xip Xp 
m 


b) If X; is a variable, continue with a derivation of w; from X;, in the context 
of the derivation being constructed. That is, if this derivation is 


Xi > a > Q°:' > Wi 
lm lm lm 
we proceed with 


W1W2 +++ Wi—1Xi Xip + Xk > 


lm 
wy we Wi-101 Xj41 °°: Xk = 
m 
W1W2 +++ Wj-102Xj41 +++ Xp = 
m 


wwe: Wi Xiti Xip Xk 


The result is a derivation A> wiw2-:-w;Xi41-+: Xz. 
lm 


When i = k, the result is a leftmost derivation of w from A. 


Example 5.15: Let us construct the leftmost derivation for the tree of Fig. 5.6. 
We shall show only the final step, where we construct the derivation from the 
entire tree from derivations that correspond to the subtrees of the root. That is, 
we shall assume that by recursive application of the technique in Theorem 5.14, 
we have deduced that the subtree rooted at the first child of the root has 
leftmost derivation Æ a I => a, while the subtree rooted at the third child of 


the root has leftmost derivaron 


E > (E) => (E + E) > (I +E)=> (a+ E) > 


lm m lm lm 


(a +1) = (a+ 10) > (a+ 100) > (a +00) 
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To build a leftmost derivation for the entire tree, we start with the step at 
the root: E = E» E. Then, we replace the first E according to its deriva- 
lm 
tion, following each step by xE to account for the larger context in which that 
derivation is used. The leftmost derivation so far is thus 
E> ExE> le ED ark 
lm 


lm lm 


The * in the production used at the root requires no derivation, so the 
above leftmost derivation also accounts for the first two children of the root. 
We complete the leftmost derivation by using the derivation of E = (a +600), 

m 
in a context where it is preceded by ax and followed by the empty string. This 
derivation actually appeared in Example 5.6; it is: 


E> ExE> le ES atk E> 
lm lm lm lm 


a*x(E)=> ax(E+E)=> ax(I+F)=> ax(a+ E) > 


lm lm lm lm 


ax(a+I)=> ax(at+JI0)> ax (a+ 100) > ax (a+600) 


lm lm lm 


A similar theorem lets us convert a tree to a rightmost derivation. The 
construction of a rightmost derivation from a tree is almost the same as the 
construction of a leftmost derivation. However, after starting with the step 
A => X,X2--:X,z, we expand X+ first, using a rightmost derivation, then 


rm 
expand X,_1, and so on, down to X,. Thus, we shall state without further 
proof: 


Theorem 5.16: Let G = (V,T, P, S) be a CFG, and suppose there is a parse 

tree with root labeled by variable A and with yield w, where w is in T*. Then 

there is a rightmost derivation A => w in grammar G. 
rm 


5.2.6 From Derivations to Recursive Inferences 


We now complete the loop suggested by Fig. 5.7 by showing that whenever 
there is a derivation A = w for some CFG, then the fact that w is in the 
language of A is discovered in the recursive inference procedure. Before giving 
the theorem and proof, let us observe something important about derivations. 

Suppose that we have a derivation A => X,X»2--::X, = w. Then we can 
break w into pieces w = w ,w2---wz such that X; = w;. Note that if X; is 
a terminal, then w; = X;, and the derivation is zero steps. The proof of this 
observation is not hard. You can show by induction on the number of steps 
of the derivation, that if X;X2---X,z á a, then all the positions of a that 
come from expansion of X; are to the left of all the positions that come from 
expansion of Xj, ifi < j. 
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If X; is a variable, we can obtain the derivation of X; 5 wi by starting 
with the derivation A > w, and stripping away: 


a) All the positions of the sentential forms that are either to the left or right 
of the positions that are derived from X;, and 


b) All the steps that are not relevant to the derivation of w; from Xj. 
An example should make this process clear. 


Example 5.17: Using the expression grammar of Fig. 5.2, consider the deriva- 
tion 


ESE*« ES ExE+E>IŐIxE+E>IŐIxI+E> 
IxI+I=>axI+I=>axb+I=>axb+a 


Consider the third sentential form, E x E + E, and the middle E in this form.? 

Starting from E x E + E, we may follow the steps of the above derivation, 
but strip away whatever positions are derived from the E» to the left of the 
central Æ or derived from the +E to its right. The steps of the derivation then 
become E, E,I,I,I,b,b. That is, the next step does not change the central E, 
the step after that changes it to I, the next two steps leave it as J, the next 
changes it to b, and the final step does not change what is derived from the 
central E. 

If we take only the steps that change what comes from the central Æ, the 
sequence of strings E, E,I,I,I,b,b becomes the derivation E > I => b. That 
derivation correctly describes how the central EF evolves during the complete 
derivation. 


Theorem 5.18: Let G = (V,T, P, S) be a CFG, and suppose there is a deriva- 

tion AS w, where w is in T*. Then the recursive inference procedure applied 
G 

to G determines that w is in the language of variable A. 


PROOF: The proof is an induction on the length of the derivation A Š w. 


BASIS: If the derivation is one-step, then A — w must be a production. Since 
w consists of terminals only, the fact that w is in the language of A will be 
discovered in the basis part of the recursive inference procedure. 


INDUCTION: Suppose the derivation takes n + 1 steps, and assume that for 
any derivation of n or fewer steps, the statement holds. Write the derivation 
as A > Xi Xo- Xk Š w. Then, as discussed prior to the theorem, we can 
break w as w = w1w2 wg, where: 


?Our discussion of finding subderivations from larger derivations assumed we were con- 
cerned with a variable in the second sentential form of some derivation. However, the idea 
applies to a variable in any step of a derivation. 
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a) If X; is a terminal, then w; = X;. 


b) If X; is a variable, then X; 5 wi. Since the first step of the derivation 
A> wis surely not part of the derivation X; 5 wi, we know that this 
derivation is of n or fewer steps. Thus, the inductive hypothesis applies 
to it, and we know that w; is inferred to be in the language of X;. 


Now, we have a production A > X,X2:-+: Xx, with w; either equal to X; or 
known to be in the language of X;. In the next round of the recursive inference 
procedure, we shall discover that w,w2---w, is in the language of A. Since 
wiw- Wr =w, we have shown that w is inferred to be in the language of A. 


5.2.7 Exercises for Section 5.2 


Exercise 5.2.1: For the grammar and each of the strings in Exercise 5.1.2, 
give parse trees. 


Exercise 5.2.2: Suppose that G is a CFG without any productions that have 
€ as the right side. If w is in L(G), the length of w is n, and w has a derivation 
of m steps, show that w has a parse tree with n + m nodes. 


Exercise 5.2.3: Suppose all is as in Exercise 5.2.2, but G may have some 
productions with € as the right side. Show that a parse tree for a string w other 
than e may have as many as n + 2m — 1 nodes, but no more. 


Exercise 5.2.4: In Section 5.2.6 we mentioned that if X1 Xə --- Xk 5 a, then 
all the positions of a that come from expansion of X; are to the left of all the 
positions that come from expansion of X;, if i < j. Prove this fact. Hint: 
Perform an induction on the number of steps in the derivation. 


5.3 Applications of Context-Free Grammars 


Context-free grammars were originally conceived by N. Chomsky as a way to 
describe natural languages. That promise has not been fulfilled. However, as 
uses for recursively defined concepts in Computer Science have multiplied, so 
has the need for CFG’s as a way to describe instances of these concepts. We 
shall sketch two of these uses, one old and one new. 


1. Grammars are used to describe programming languages. More impor- 
tantly, there is a mechanical way of turning the language description as 
a CFG into a parser, the component of the compiler that discovers the 
structure of the source program and represents that structure by a parse 
tree. This application is one of the earliest uses of CFG’s; in fact it is 
one of the first ways in which theoretical ideas in Computer Science found 
their way into practice. 
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2. The development of XML (Extensible Markup Language) is widely pre- 
dicted to facilitate electronic commerce by allowing participants to share 
conventions regarding the format of orders, product descriptions, and 
many other kinds of documents. An essential part of XML is the Docu- 
ment Type Definition (DTD), which is essentially a context-free grammar 
that describes the allowable tags and the ways in which these tags may 
be nested. Tags are the familiar keywords with triangular brackets that 
you may know from HTML, e.g., <EM> and </EM> to surround text that 
needs to be emphasized. However, XML tags deal not with the formatting 
of text, but with the meaning of text. For instance, one could surround 
a sequence of characters that was intended to be interpreted as a phone 
number by <PHONE> and </PHONE>. 


5.3.1 Parsers 


Many aspects of a programming language have a structure that may be de- 
scribed by regular expressions. For instance, we discussed in Example 3.9 how 
identifiers could be represented by regular expressions. However, there are also 
some very important aspects of typical programming languages that cannot be 
represented by regular expressions alone. The following are two examples. 


Example 5.19: Typical languages use parentheses and/or brackets in a nested 
and balanced fashion. That is, we must be able to match some left parenthesis 
against a right parenthesis that appears immediately to its right, remove both 
of them, and repeat. If we eventually eliminate all the parentheses, then the 
string was balanced, and if we cannot match parentheses in this way, then it is 
unbalanced. Examples of strings of balanced parentheses are (()), 00, (QQ), 
and e, while )( and (() are not. 

A grammar Goat = ({B}, {(,)}, P, B) generates all and only the strings of 
balanced parentheses, where P consists of the productions: 


B- BB|(B)|e 


The first production, B + BB, says that the concatenation of two strings of 
balanced parentheses is balanced. That assertion makes sense, because we can 
match the parentheses in the two strings independently. The second production, 
B > (B), says that if we place a pair of parentheses around a balanced string, 
then the result is balanced. Again, this rule makes sense, because if we match 
the parentheses in the inner string, then they are all eliminated and we are then 
allowed to match the first and last parentheses, which have become adjacent. 
The third production, B — e is the basis; it says that the empty string is 
balanced. 

The above informal arguments should convince us that Gy; generates all 
strings of balanced parentheses. We need a proof of the converse — that every 
string of balanced parentheses is generated by this grammar. However, a proof 
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by induction on the length of the balanced string is not hard and is left as an 
exercise. 

We mentioned that the set of strings of balanced parentheses is not a regular 
language, and we shall now prove that fact. If L(Goar) were regular, then there 
would be a constant n for this language from the pumping lemma for regular 
languages. Consider the balanced string w = (")”, that is, n left parentheses 
followed by n matching right parentheses. If we break w = xyz according to 
the pumping lemma, then y consists of only left parentheses, and therefore xz 
has more right parentheses than left. This string is not balanced, contradicting 
the assumption that the language of balanced parentheses is regular. 


Programming languages consist of more than parentheses, of course, but 
parentheses are an essential part of arithmetic or conditional expressions. The 
grammar of Fig. 5.2 is more typical of the structure of arithmetic expressions, 
although we used only two operators, plus and times, and we included the de- 
tailed structure of identifiers, which would more likely be handled by the lexical- 
analyzer portion of the compiler, as we mentioned in Section 3.3.2. However, 
the language described in Fig. 5.2 is not regular either. For instance, according 
to this grammar, ("a)” is a legal expression. We can use the pumping lemma 
to show that if the language were regular, then a string with some of the left 
parentheses removed and the a and all right parentheses intact would also be a 
legal expression, which it is not. 

There are numerous aspects of a typical programming language that behave 
like balanced parentheses. There will usually be parentheses themselves, in 
expressions of all types. Beginnings and endings of code blocks, such as begin 
and end in Pascal, or the curly braces {...} of C, are examples. That is, 
whatever curly braces appear in a C program must form a balanced sequence, 
with { in place of the left parenthesis and } in place of the right parenthesis. 

There is a related pattern that appears occasionally, where “parentheses” 
can be balanced with the exception that there can be unbalanced left parenthe- 
ses. An example is the treatment of if and else in C. An if-clause can appear 
unbalanced by any else-clause, or it may be balanced by a matching else-clause. 
A grammar that generates the possible sequences of if and else (represented 
by i and e, respectively) is: 


S+e| SS | iS | iSes 


For instance, ieie, iie, and iei are possible sequences of if’s and else’s, and 
each of these strings is generated by the above grammar. Some examples of 
illegal sequences, not generated by the grammar, are ei and ieeii. 

A simple test (whose correctness we leave as an exercise), for whether a 
sequence of it’s and e’s is generated by the grammar is to consider each e, in 
turn from the left. Look for the first i to the left of the e being considered. If 
there is none, the string fails the test and is not in the language. If there is such 
an i, delete this 7 and the e being considered. Then, if there are no more e’s 
the string passes the test and is in the language. If there are more e’s, proceed 
to consider the next one. 
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Example 5.20: Consider the string iee. The first e is matched with the i to 
its left. They are removed, leaving the string e. Since there are more e’s we 
consider the next. However, there is no 7 to its left, so the test fails; iee is not 
in the language. Note that this conclusion is valid, since you cannot have more 
else’s than if’s in a C program. 

For another example, consider iieie. Matching the first e with the 7 to its 
left leaves ite. Matching the remaining e with the 7 to its left leaves i. Now 
there are no more e’s, so the test succeeds. This conclusion also makes sense, 
because the sequence iieie corresponds to a C program whose structure is like 
that of Fig. 5.10. In fact, the matching algorithm also tells us (and the C 
compiler) which if matches any given else. That knowledge is essential if the 
compiler is to create the control-flow logic intended by the programmer. 


if (Condition) { 


if (Condition) Statement; 
else Statement; 


if (Condition) Statement; 
else Statement; 


Figure 5.10: An if-else structure; the two else’s match their previous if’s, and 
the first if is unmatched 


5.3.2 The YACC Parser-Generator 


The generation of a parser (function that creates parse trees from source pro- 
grams) has been institutionalized in the YACC command that appears in all 
UNIX systems. The input to YACC is a CFG, in a notation that differs only 
in details from the one we have used here. Associated with each production is 
an action, which is a fragment of C code that is performed whenever a node of 
the parse tree that (with its children) corresponds to this production is created. 
Typically, the action is code to construct that node, although in some YACC 
applications the tree is not actually constructed, and the action does something 
else, such as emit a piece of object code. 


Example 5.21: In Fig. 5.11 is a sample of a CFG in the YACC notation. 
The grammar is the same as that of Fig. 5.2. We have elided the actions, just 
showing their (required) curly braces and their position in the YACC input. 

Notice the following correspondences between the YACC notation for gram- 
mars and ours: 
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Exp : Id ee 
| Exp ’+? Exp {...} 
| Exp ’*? Exp {...} 
bi Ch Exp 2: Fa} 

Id : ’a? {...} 
| ?b? Hs tek 
| Id ’a’ Test 
| Id ’b’ {eae} 
| Id ?0? {na} 
| Id 71? T 


Figure 5.11: An example of a grammar in the YACC notation 


The colon is used as the production symbol, our >. 


All the productions with a given head are grouped together, and their 
bodies are separated by the vertical bar. We also allow this convention, 
as an option. 


The list of bodies for a given head ends with a semicolon. We have not 
used a terminating symbol. 


Terminals are quoted with single quotes. Several characters can appear 
within a single pair of quotes. Although we have not shown it, YACC al- 
lows its user to define symbolic terminals as well. The occurrence of these 
terminals in the source program are detected by the lexical analyzer and 
signaled, through the return-value of the lexical analyzer, to the parser. 


Unquoted strings of letters and digits are variable names. We have taken 
advantage of this capability to give our two variables more descriptive 
names — Exp and Id — although E and I could have been used. 


5.3.3 Markup Languages 


We shall next consider a family of “languages” called markup languages. The 
“strings” in these languages are documents with certain marks (called tags) in 
them. Tags tell us something about the semantics of various strings within the 
document. 

The markup language with which you are probably most familiar is HTML 
(HyperText Markup Language). This language has two major functions: creat- 
ing links between documents and describing the format (“look”) of a document. 
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We shall offer only a simplified view of the structure of HTML, but the follow- 
ing examples should suggest both its structure and how a CFG could be used 
both to describe the legal HTML documents and to guide the processing (i.e., 
the display on a monitor or printer) of a document. 


Example 5.22: Figure 5.12(a) shows a piece of text, comprising a list of items, 
and Fig. 5.12(b) shows its expression in HTML. Notice from Fig. 5.12(b) that 
HTML consists of ordinary text interspersed with tags. Matching tags are of 
the form <z> and </a> for some string z.? For instance, we see the matching 
tags <EM> and </EM>, which indicate that the text between them should be 
emphasized, that is, put in italics or another appropriate font. We also see the 
matching tags <OL> and </OL>, indicating an ordered list, i.e., an enumeration 
of list items. 


The things I hate: 
1. Moldy bread. 
2. People who drive too slow in the fast lane. 
(a) The text as viewed 


<P>The things I <EM>hate</EM>: 
<OL> 

<LI>Moldy bread. 

<LI>People who drive too slow 
in the fast lane. 

</OL> 


(b) The HTML source 


Figure 5.12: An HTML document and its printed version 


We also see two examples of unmatched tags: <P> and <LI>, which introduce 
paragraphs and list items, respectively. HTML allows, indeed encourages, that 
these tags be matched by </P> and </LI> at the ends of paragraphs and list 
items, but it does not require the matching. We have therefore left the matching 
tags off, to provide some complexity to the sample HTML grammar we shall 
develop. 


There are a number of classes of strings that are associated with an HTML 
document. We shall not try to list them all, but here are the ones essential to 
the understanding of text like that of Example 5.22. For each class, we shall 
introduce a variable with a descriptive name. 


3Sometimes the introducing tag <a> has more information in it than just the name g for 
the tag. However, we shall not consider that possibility in examples. 
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1. Text is any string of characters that can be literally interpreted; i.e., it 
has no tags. An example of a Text element in Fig 5.12(a) is “Moldy 
bread.” 


2. Char is any string consisting of a single character that is legal in HTML 
text. Note that blanks are included as characters. 


3. Doc represents documents, which are sequences of “elements.” We define 
elements next, and that definition is mutually recursive with the definition 
of a Doc. 


4. Element is either a Text string, or a pair of matching tags and the doc- 
ument between them, or an unmatched tag followed by a document. 


5. ListItem is the <LI> tag followed by a document, which is a single list 
item. 


6. List is a sequence of zero or more list items. 


1. Char > alAl--: 
2. Text > «€| Char Text 
3. Doc > €| Element Doc 


4. Element —> Text | 
<EM> Doc </EM> | 
<P> Doc | 
<OL> List </OL> | --- 


5. ListItem — <LI> Doc 


6. List > €| ListItem List 


Figure 5.13: Part of an HTML grammar 


Figure 5.13 is a CFG that describes as much of the structure of the HTML 
language as we have covered. In line (1) it is suggested that a character can 
be “a” or “A” or many other possible characters that are part of the HTML 
character set. Line (2) says, using two productions, that Text can be either the 
empty string, or any legal character followed by more text. Put another way, 
Text is zero or more characters. Note that < and > are not legal characters, 
although they can be represented by the sequences &1t; and &gt;, respectively. 
Thus, we cannot accidentally get a tag into Test. 

Line (3) says that a document is a sequence of zero or more “elements.” An 
element in turn, we learn at line (4), is either text, an emphasized document, a 
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paragraph-beginning followed by a document, or a list. We have also suggested 
that there are other productions for Element, corresponding to the other kinds 
of tags that appear in HTML. Then, in line (5) we find that a list item is the 
<LI> tag followed by any document, and line (6) tells us that a list is a sequence 
of zero or more list elements. 

Some aspects of HTML do not require the power of context-free grammars; 
regular expressions are adequate. For example, lines (1) and (2) of Fig. 5.13 
simply say that Text represents the same language as does the regular expres- 
sion (a+ A +---)*. However, some aspects of HTML do require the power 
of CFG’s. For instance, each pair of tags that are a corresponding beginning 
and ending pair, e.g., <EM> and </EM>, is like balanced parentheses, which we 
already know are not regular. 


5.3.4 XML and Document-Type Definitions 


The fact that HTML is described by a grammar is not in itself remarkable. 
Essentially all programming languages can be described by their own CFG’s, 
so it would be more surprising if we could not so describe HTML. However, 
when we look at another important markup language, XML (eXtensible Markup 
Language), we find that the CFG’s play a more vital role, as part of the process 
of using that language. 

The purpose of XML is not to describe the formatting of the document; 
that is the job for HTML. Rather, XML tries to describe the “semantics” of 
the text. For example, text like “12 Maple St.” looks like an address, but is 
it? In XML, tags would surround a phrase that represented an address; for 
example: 


<ADDR>12 Maple St.</ADDR> 


However, it is not immediately obvious that <ADDR> means the address of a 
building. For instance, if the document were about memory allocation, we might 
expect that the <ADDR> tag would refer to a memory address. To make clear 
what the different kinds of tags are, and what structures may appear between 
matching pairs of these tags, people with a common interest are expected to 
develop standards in the form of a DTD (Document-Type Definition). 

A DTD is essentially a context-free grammar, with its own notation for 
describing the variables and productions. In the next example, we shall show 
a simple DTD and introduce some of the language used for describing DTD’s. 
The DTD language itself has a context-free grammar, but it is not that grammar 
we are interested in describing. Rather, the language for describing DTD’s is 
essentially a CFG notation, and we want to see how CFG’s are expressed in 
this language. 

The form of a DTD is 


<!DOCTYPE name-of-DTD [ 
list of element definitions 


]> 
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An element definition, in turn, has the form 
<!ELEMENT element-name (description of the element)> 


Element descriptions are essentially regular expressions. The basis of these 
expressions are: 


1. Other element names, representing the fact that elements of one type can 
appear within elements of another type, just as in HTML we might find 
emphasized text within a list. 


2. The special term #PCDATA, standing for any text that does not involve 
XML tags. This term plays the role of variable Text in Example 5.22. 


The allowed operators are: 


1. | standing for union, as in the UNIX regular-expression notation discussed 
in Section 3.3.1. 


2. A comma, denoting concatenation. 


3. Three variants of the closure operator, as in Section 3.3.1. These are x, 
the usual operator meaning “zero or more occurrences of,” +, meaning 
“one or more occurrences of,” and ?, meaning “zero or one occurrence 
of.” 


Parentheses may group operators to their arguments; otherwise, the usual prece- 
dence of regular-expression operators applies. 


Example 5.23: Let us imagine that computer vendors get together to create 
a standard DTD that they can use to publish, on the Web, descriptions of the 
various PC’s that they currently sell. Each description of a PC will have a 
model number, and details about the features of the model, e.g., the amount of 
RAM, number and size of disks, and so on. Figure 5.14 shows a hypothetical, 
very simple, DTD for personal computers. 

The name of the DTD is PcSpecs. The first element, which is like the start 
symbol of a CFG, is PCS (list of PC specifications). Its definition, PC*, says 
that a PCS is zero or more PC entries. 

We then see the definition of a PC element. It consists of the concatenation 
of five things. The first four are other elements, corresponding to the model, 
price, processor type, and RAM of the PC. Each of these must appear once, 
in that order, since the comma represents concatenation. The last constituent, 
DISK+, tells us that there will be one or more disk entries for a PC. 

Many of the constituents are simply text; MODEL, PRICE, and RAM are of this 
type. However, PROCESSOR has more structure. We see from its definition that 
it consists of a manufacturer, model, and speed, in that order; each of these 
elements is simple text. 


202 CHAPTER 5. CONTEXT-FREE GRAMMARS AND LANGUAGES 


<!DOCTYPE PcSpecs [ 
<!ELEMENT PCS (PC*)> 
<!ELEMENT PC (MODEL, PRICE, PROCESSOR, RAM, DISK+)> 
<!ELEMENT MODEL (#PCDATA)> 
<!ELEMENT PRICE (#PCDATA)> 
<!ELEMENT PROCESSOR (MANF, MODEL, SPEED)> 
<!ELEMENT MANF (#PCDATA)> 
<!ELEMENT MODEL (#PCDATA)> 
<!ELEMENT SPEED (#PCDATA)> 
<!ELEMENT RAM (#PCDATA)> 
<!ELEMENT DISK (HARDDISK | CD | DVD)> 
<!ELEMENT HARDDISK (MANF, MODEL, SIZE)> 
<!ELEMENT SIZE (#PCDATA)> 
<!ELEMENT CD (SPEED)> 
<!ELEMENT DVD (SPEED) > 
|> 


Figure 5.14: A DTD for personal computers 


A DISK entry is the most complex. First, a disk is either a hard disk, CD, or 
DVD, as indicated by the rule for element DISK, which is the OR of three other 
elements. Hard disks, in turn, have a structure in which the manufacturer, 
model, and size are specified, while CD’s and DVD’s are represented only by 
their speed. 

Figure 5.15 is an example of an XML document that conforms to the DTD 
of Fig. 5.14. Notice that each element is represented in the document by a tag 
with the name of that element and a matching tag at the end, with an extra 
slash, just as in HTML. Thus, in Fig. 5.15 we see at the outermost level the tag 
<PCS>...</PCS>. Inside these tags appears a list of entries, one for each PC 
sold by this manufacturer; we have only shown one such entry explicitly. 

Within the illustrated <PC> entry, we can easily see that the model number 
is 4560, the price is $2295, and it has an 800MHz Intel Pentium processor. It 
has 256Mb of RAM, a 30.5Gb Maxtor Diamond hard disk, and a 32x CD-ROM 
reader. What is important is not that we can read these facts, but that a 
program could read the document, and guided by the grammar in the DTD 
of Fig. 5.14 that it has also read, could interpret the numbers and names in 
Fig. 5.15 properly. 


You may have noticed that the rules for the elements in DTD’s like Fig. 5.14 
are not quite like productions of context-free grammars. Many of the rules are 
of the correct form. For instance, 


<!ELEMENT PROCESSOR (MANF, MODEL, SPEED) > 
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<PCS> 
<PC> 
<MODEL>4560</MODEL> 
<PRICE>$2295</PRICE> 
<PROCESSOR> 
<MANF>Intel</MANF> 
<MODEL>Pent ium</MODEL> 
<SPEED>800MHz</SPEED> 
</PROCESSOR> 
<RAM>256</RAM> 
<DISK><HARDDISK> 
<MANF>Maxtor</MANF> 
<MODEL>Diamond</MODEL> 
<SIZE>30.5Gb</SIZE> 
</HARDDISK></DISK> 
<DISK><CD> 
<SPEED>32x</SPEED> 
</CD></DISK> 
</PC> 
<PC> 


</PC> 
</PCS> 


Figure 5.15: Part of a document obeying the structure of the DTD in Fig. 5.14 


is analogous to the production 
Processor + Manf Model Speed 
However, the rule 
<!ELEMENT DISK (HARDDISK | CD | DVD)> 
does not have a definition for DISK that is like a production body. In this case, 
the extension is simple: we may interpret this rule as three productions, with 


the vertical bar playing the same role as it does in our shorthand for productions 
having a common head. Thus, this rule is equivalent to the three productions 


Disk + HardDisk | Cd | Dud 
The most difficult case is 


<!ELEMENT PC (MODEL, PRICE, PROCESSOR, RAM, DISK+)> 
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where the “body” has a closure operator within it. The solution is to replace 
DISK+ by a new variable, say Disks, that generates, via a pair of productions, 
one or more instances of the variable Disk. The equivalent productions are 
thus: 


Pe — Model Price Processor Ram Disks 
Disks > Disk | Disk Disks 


There is a general technique for converting a CFG with regular expressions 
as production bodies to an ordinary CFG. We shall give the idea informally; 
you may wish to formalize both the meaning of CFG’s with regular-expression 
productions and a proof that the extension yields no new languages beyond 
the CFL’s. We show, inductively, how to convert a production with a regular- 
expression body to a collection of equivalent ordinary productions. The induc- 
tion is on the size of the expression in the body. 


BASIS: If the body is the concatenation of elements, then the production is 
already in the legal form for CFG’s, so we do nothing. 


INDUCTION: Otherwise, there are five cases, depending on the final operator 
used. 


1. The production is of the form A > E1, E2, where E and E» are expres- 
sions permitted in the DTD language. This is the concatenation case. 
Introduce two new variables, B and C, that appear nowhere else in the 
grammar. Replace A > E1, E2 by the productions 


A> BC 
Bok 
C > E» 


The first production, A + BC, is legal for CFG’s. The last two may or 
may not be legal. However, their bodies are shorter than the body of the 
original production, so we may inductively convert them to CFG form. 


2. The production is of the form A —> Fy | E2. For this union operator, 
replace this production by the pair of productions: 


A> E 
A-> E 


Again, these productions may or may not be legal CFG productions, but 
their bodies are shorter than the body of the original. We may therefore 
apply the rules recursively and eventually convert these new productions 
to CFG form. 


3. The production is of the form A — (Eı)*. Introduce a new variable B 
that appears nowhere else, and replace this production by: 
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A~BA 
Ate 
Bo kK, 


4. The production is of the form A > (F,)*+. Introduce a new variable B 
that appears nowhere else, and replace this production by: 


A+ BA 
AB 
Bo Kk, 


5. The production is of the form A —> (F,)?. Replace this production by: 


Ave 
A-> E 


Example 5.24: Let us consider how to convert the DTD rule 
<!ELEMENT PC (MODEL, PRICE, PROCESSOR, RAM, DISK+)> 


to legal CFG productions. First, we can view the body of this rule as the con- 
catenation of two expressions, the first of which is MODEL, PRICE, PROCESSOR, 
RAM and the second of which is DISK+. If we create variables for these two 
subexpressions, say A and B, respectively, then we can use the productions: 


Pc — AB 
A — Model Price Processor Ram 
B > Diskt 


Only the last of these is not in legal form. We introduce another variable C 
and the productions: 


B>CB|C 
C > Disk 


In this special case, because the expression that A derives is just a concate- 
nation of variables, and Disk is a single variable, we actually have no need for 
the variables A or C. We could use the following productions instead: 


Pe —- Model Price Processor Ram B 
B > Disk B | Disk 
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5.3.5 Exercises for Section 5.3 


Exercise 5.3.1: Prove that if a string of parentheses is balanced, in the sense 
given in Example 5.19, then it is generated by the grammar B > BB | (B) | e€. 
Hint: Perform an induction on the length of the string. 


Exercise 5.3.2: Consider the set of all strings of balanced parentheses of two 
types, round and square. An example of where these strings come from is as 
follows. If we take expressions in C, which use round parentheses for grouping 
and for arguments of function calls, and use square brackets for array indexes, 
and drop out everything but the parentheses, we get all strings of balanced 
parentheses of these two types. For example, 


f(a[i]*(b[i] [j] ,c[g(x)]),d[i]) 


becomes the balanced-parenthesis string ([] (0O O [O])[]). Design a gram- 
mar for all and only the strings of round and square parentheses that are bal- 
anced. 


Exercise 5.3.3: In Section 5.3.1, we considered the grammar 
S e| SS |iS | iSes 


and claimed that we could test for membership in its language L by repeatedly 
doing the following, starting with a string w. The string w changes during 
repetitions. 


1. If the current string begins with e, fail; w is not in L. 
2. If the string currently has no e’s (it may have i’s), succeed; w is in L. 


3. Otherwise, delete the first e and the 7 immediately to its left. Then repeat 
these three steps on the new string. 


Prove that this process correctly identifies the strings in L. 
Exercise 5.3.4: Add the following forms to the HTML grammar of Fig. 5.13: 
* a) A list item must be ended by a closing tag </LI>. 


b) An element can be an unordered list, as well as an ordered list. Unordered 
lists are surrounded by the tag <UL> and its closing </UL>. 


! c) An element can be a table. Tables are surrounded by <TABLE> and its 
closer </TABLE>. Inside these tags are one or more rows, each of which 
is surrounded by <TR> and </TR>. The first row is the header, with one 
or more fields, each introduced by the <TH> tag (we'll assume these are 
not closed, although they should be). Subsequent rows have their fields 
introduced by the <TD> tag. 


Exercise 5.3.5: Convert the DTD of Fig. 5.16 to a context-free grammar. 


5.4. AMBIGUITY INGRAMMARS AND LANGUAGES 207 


<!DOCTYPE CourseSpecs [ 
<!ELEMENT COURSES (COURSE+)> 
<!ELEMENT COURSE (CNAME, PROF, STUDENT*, TA?)> 
<!ELEMENT CNAME (#PCDATA)> 
<!ELEMENT PROF (#PCDATA)> 
<!ELEMENT STUDENT (#PCDATA)> 
<!ELEMENT TA (#PCDATA)> ]> 


Figure 5.16: A DTD for courses 


5.4 Ambiguity in Grammars and Languages 


As we have seen, applications of CFG’s often rely on the grammar to provide 
the structure of files. For instance, we saw in Section 5.3 how grammars can be 
used to put structure on programs and documents. The tacit assumption was 
that a grammar uniquely determines a structure for each string in its language. 
However, we shall see that not every grammar does provide unique structures. 

When a grammar fails to provide unique structures, it is sometimes possible 
to redesign the grammar to make the structure unique for each string in the 
language. Unfortunately, sometimes we cannot do so. That is, there are some 
CFL’s that are “inherently ambiguous”; every grammar for the language puts 
more than one structure on some strings in the language. 


5.4.1 Ambiguous Grammars 


Let us return to our running example: the expression grammar of Fig. 5.2. This 
grammar lets us generate expressions with any sequence of x and + operators, 
and the productions E > E + E | E x E allow us to generate these expressions 
in any order we choose. 


Example 5.25: For instance, consider the sentential form E + E x E. It has 
two derivations from E: 


LES E+ES E+E*E 


2 E>E+E>E+E«E 


Notice that in derivation (1), the second E is replaced by E x E, while in 
derivation (2), the first E is replaced by E + E. Figure 5.17 shows the two 
parse trees, which we should note are distinct trees. 

The difference between these two derivations is significant. As far as the 
structure of the expressions is concerned, derivation (1) says that the second and 
third expressions are multiplied, and the result is added to the first. expression, 
while derivation (2) adds the first two expressions and multiplies the result by 
the third. In more concrete terms, the first derivation suggests that 1 + 2 * 3 
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Pa xí 
a ee 


E 


(a) (b) 


Figure 5.17: Two parse trees with the same yield 


should be grouped 1 + (2 x 3) = 7, while the second derivation suggests the 
same expression should be grouped (1 + 2) * 3 = 9. Obviously, the first of 
these, and not the second, matches our notion of correct grouping of arithmetic 
expressions. 

Since the grammar of Fig. 5.2 gives two different structures to any string 
of terminals that is derived by replacing the three expressions in E + E x E by 
identifiers, we see that this grammar is not a good one for providing unique 
structure. In particular, while it can give strings the correct grouping as arith- 
metic expressions, it also gives them incorrect groupings. To use this expression 
grammar in a compiler, we would have to modify it to provide only the correct 
groupings. 


On the other hand, the mere existence of different derivations for a string 
(as opposed to different parse trees) does not imply a defect in the grammar. 
The following is an example. 


Example 5.26: Using the same expression grammar, we find that the string 
a+b has many different derivations. Two examples are: 


1. E> E+E>I+E>a+E=>a+I=>a+b 


2 E> E+E>E+I>I+I>I+b>a+b 


However, there is no real difference between the structures provided by these 
derivations; they each say that a and b are identifiers, and that their values are 
to be added. In fact, both of these derivations produce the same parse tree if 
the construction of Theorems 5.18 and 5.12 are applied. 


The two examples above suggest that it is not a multiplicity of derivations 
that cause ambiguity, but rather the existence of two or more parse trees. Thus, 
we say a CFG G = (V,T, P, S) is ambiguous if there is at least one string w 
in T* for which we can find two different parse trees, each with root labeled S 
and yield w. If each string has at most one parse tree in the grammar, then the 
grammar is unambiguous. 

For instance, Example 5.25 almost demonstrated the ambiguity of the gram- 
mar of Fig. 5.2. We have only to show that the trees of Fig. 5.17 can be com- 
pleted to have terminal yields. Figure 5.18 is an example of that completion. 
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(a) (b) 


Figure 5.18: Trees with yield a + a x a, demonstrating the ambiguity of our 
expression grammar 


5.4.2 Removing Ambiguity From Grammars 


In an ideal world, we would be able to give you an algorithm to remove ambi- 
guity from CFG’s, much as we were able to show an algorithm in Section 4.4 to 
remove unnecessary states of a finite automaton. However, the surprising fact 
is, as we shall show in Section 9.5.2, that there is no algorithm whatsoever that 
can even tell us whether a CFG is ambiguous in the first place. Moreover, we 
shall see in Section 5.4.4 that there are context-free languages that have nothing 
but ambiguous CFG’s; for these languages, removal of ambiguity is impossible. 

Fortunately, the situation in practice is not so grim. For the sorts of con- 
structs that appear in common programming languages, there are well-known 
techniques for eliminating ambiguity. The problem with the expression gram- 
mar of Fig. 5.2 is typical, and we shall explore the elimination of its ambiguity 
as an important illustration. 

First, let us note that there are two causes of ambiguity in the grammar of 
Fig. 5.2: 


1. The precedence of operators is not respected. While Fig. 5.17(a) properly 
groups the * before the + operator, Fig 5.17(b) is also a valid parse tree 
and groups the + ahead of the x. We need to force only the structure of 
Fig. 5.17(a) to be legal in an unambiguous grammar. 


2. A sequence of identical operators can group either from the left or from the 
right. For example, if the *’s in Fig. 5.17 were replaced by +’s, we would 
see two different parse trees for the string E + E + E. Since addition 
and multiplication are associative, it doesn’t matter whether we group 
from the left or the right, but to eliminate ambiguity, we must pick one. 
The conventional approach is to insist on grouping from the left, so the 
structure of Fig. 5.17(b) is the only correct grouping of two +-signs. 
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Ambiguity Resolution in YACC 


If the expression grammar we have been using is ambiguous, we might 
wonder whether the sample YACC program of Fig. 5.11 is realistic. True, 
the underlying grammar is ambiguous, but much of the power of the YACC 
parser-generator comes from providing the user with simple mechanisms 
for resolving most of the common causes of ambiguity. For the expression 
grammar, it is sufficient to insist that: 


a) x takes precedence over +. That is, *’s must be grouped before 
adjacent +’s on either side. This rule tells us to use derivation (1) 
in Example 5.25, rather than derivation (2). 


b) Both x and + are left-associative. That is, group sequences of ex- 
pressions, all of which are connected by *, from the left, and do the 
same for sequences connected by +. 


YACC allows us to state the precedence of operators by listing them 
in order, from lowest to highest precedence. Technically, the precedence 
of an operator applies to the use of any production of which that operator 
is the rightmost terminal in the body. We can also declare operators to 
be left- or right-associative with the keywords {left and %right. For 
instance, to declare that + and * were both left associative, with * taking 
precedence over +, we would put ahead of the grammar of Fig. 5.11 the 
statements: 


hleft ’+? 
hleft ’?*? 


The solution to the problem of enforcing precedence is to introduce several 
different variables, each of which represents those expressions that share a level 
of “binding strength.” Specifically: 


1. A factor is an expression that cannot be broken apart by any adjacent 
operator, either a x or a +. The only factors in our expression language 
are: 


(a) Identifiers. It is not possible to separate the letters of an identifier 
by attaching an operator. 


(b) Any parenthesized expression, no matter what appears inside the 
parentheses. It is the purpose of parentheses to prevent what is inside 
from becoming the operand of any operator outside the parentheses. 
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2. A term is an expression that cannot be broken by the + operator. In our 
example, where + and * are the only operators, a term is a product of 
one or more factors. For instance, the term a xb can be “broken” if we 
use left associativity and place alx to its left. That is, al xa xb is grouped 
(al « a) xb, which breaks apart the a * b. However, placing an additive 
term, such as al+, to its left or +al to its right cannot break a xb. The 
proper grouping of al + a xb is al + (a x b), and the proper grouping of 
axb+al is (a*x6)+ al. 


3. An expression will henceforth refer to any possible expression, including 
those that can be broken by either an adjacent * or an adjacent +. Thus, 
an expression for our example is a sum of one or more terms. 


I > alo|ia|Jb|10| 1 
F > I|(E) 

T > F|T*F 

E > T|E+T 


Figure 5.19: An unambiguous expression grammar 


Example 5.27: Figure 5.19 shows an unambiguous grammar that generates 
the same language as the grammar of Fig. 5.2. Think of F, T, and E as the 
variables whose languages are the factors, terms, and expressions, as defined 
above. For instance, this grammar allows only one parse tree for the string 
a +a * a; it is shown in Fig. 5.20. 


BH. 


Pax 
F F I 
E 


Figure 5.20: The sole parse tree for a+ax*a 


The fact that this grammar is unambiguous may be far from obvious. Here 
are the key observations that explain why no string in the language can have 
two different parse trees. 


212 CHAPTER 5. CONTEXT-FREE GRAMMARS AND LANGUAGES 


e Any string derived from T, a term, must be a sequence of one or more 
factors, connected by *’s. A factor, as we have defined it, and as follows 
from the productions for F in Fig. 5.19, is either a single identifier or any 
parenthesized expression. 


e Because of the form of the two productions for T, the only parse tree for 
a sequence of factors is the one that breaks fı x fo *---* fn, for n > 1 into 
a term fı x fo *---* fn-1 and a factor fn. The reason is that F cannot 
derive expressions like f,_-1 * fn without introducing parentheses around 
them. Thus, it is not possible that when using the production T > T *F, 
the F derives anything but the last of the factors. That is, the parse tree 
for a term can only look like Fig. 5.21. 


a 


T 
| 
F 


Figure 5.21: The form of all parse trees for a term 


e Likewise, an expression is a sequence of terms connected by +. When 
we use the production E — E + T to derive tj + t2 +--+: + tn, the T 
must derive only tn, and the E in the body derives tı + t2 +--+ + ty-1. 
The reason, again, is that T cannot derive the sum of two or more terms 
without putting parentheses around them. 


5.4.3 Leftmost Derivations as a Way to Express 
Ambiguity 


While derivations are not necessarily unique, even if the grammar is unambigu- 
ous, it turns out that, in an unambiguous grammar, leftmost derivations will 
be unique, and rightmost derivations will be unique. We shall consider leftmost 
derivations only, and state the result for rightmost derivations. 
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Example 5.28: As an example, notice the two parse trees of Fig. 5.18 that 
each yield E + Ex E. If we construct leftmost derivations from them we get 
the following leftmost derivations from trees (a) and (b), respectively: 


a) E> E+E=> I+E> at+E= at+ExE=> atlIxE=> a+axE > 
lm lm lm lm lm lm lm 
ataxI> at+axa 
lm 
b) B ESES Porn lie tees aca aos a ae a 
E ee ty be PS aad 


lm lm 
Note that these two leftmost derivations differ. This example does not prove 
the theorem, but demonstrates how the differences in the trees force different, 
steps to be taken in the leftmost derivation. 


Theorem 5.29: For each grammar G = (V,T, P, S) and string w in T*, w has 
two distinct parse trees if and only if w has two distinct leftmost derivations 
from S. 


PROOF: (Only-if) If we examine the construction of a leftmost derivation from 
a parse tree in the proof of Theorem 5.14, we see that wherever the two parse 
trees first have a node at which different productions are used, the leftmost 
derivations constructed will also use different productions and thus be different, 
derivations. 


(If) While we have not previously given a direct construction of a parse tree 
from a leftmost derivation, the idea is not hard. Start constructing a tree with 
only the root, labeled S. Examine the derivation one step at a time. At each 
step, a variable will be replaced, and this variable will correspond to the leftmost, 
node in the tree being constructed that has no children but that has a variable 
as its label. From the production used at this step of the leftmost derivation, 
determine what the children of this node should be. If there are two distinct 
derivations, then at the first step where the derivations differ, the nodes being 
constructed will get different lists of children, and this difference guarantees 
that the parse trees are distinct. 


5.4.4 Inherent Ambiguity 


A context-free language L is said to be inherently ambiguous if all its gram- 
mars are ambiguous. If even one grammar for L is unambiguous, then L is an 
unambiguous language. We saw, for example, that the language of expressions 
generated by the grammar of Fig. 5.2 is actually unambiguous. Even though 
that grammar is ambiguous, there is another grammar for the same language 
that is unambiguous — the grammar of Fig. 5.19. 

We shall not prove that there are inherently ambiguous languages. Rather 
we shall discuss one example of a language that can be proved inherently am- 
biguous, and we shall explain intuitively why every grammar for the language 
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must be ambiguous. The language L in question is: 
L = {a"b"’c™d™ |n>1,m>1}U {abcd |n > 1,m> 1} 
That is, L consists of strings in atbtctd™ such that either: 
1. There are as many a’s as b’s and as many c’s as d’s, or 
2. There are as many a’s as d’s and as many 0’s as c’s. 


L is a context-free language. The obvious grammar for L is shown in 
Fig. 5.22. It uses separate sets of productions to generate the two kinds of 
strings in L. 


S > AB|C 
A > aAb|ab 
B > cBd|cd 
C > aCd|aDd 
D > bDe| be 


Figure 5.22: A grammar for an inherently ambiguous language 


This grammar is ambiguous. For example, the string aabbccdd has the two 
leftmost derivations: 


1. S> AB aAbB > aabbB > aabbeBd => aabbccdd 


2. oF Ca aid aaDädd — gabDedd. => aabbccdd 


and the two parse trees shown in Fig. 5.23. 

The proof that all grammars for L must be ambiguous is complex. However, 
the essence is as follows. We need to argue that all but a finite number of the 
strings whose counts of the four symbols a, b, c, and d, are all equal must be 
generated in two different ways: one in which the a’s and 6’s are generated to 
be equal and the c’s and d’s are generated to be equal, and a second way, where 
the a’s and d’s are generated to be equal and likewise the 6’s and c’s. 

For instance, the only way to generate strings where the a’s and b’s have 
the same number is with a variable like A in the grammar of Fig. 5.22. There 
are variations, of course, but these variations do not change the basic picture. 
For instance: 


e Some small strings can be avoided, say by changing the basis production 
A — ab to A — aaabbb, for instance. 


e We could arrange that A shares its job with some other variables, e.g., 
by using variables A; and As, with A; generating the odd numbers of a’s 
and A» generating the even numbers, as: A; > aAəb | ab; Ap > aAıb. 
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Figure 5.23: Two parse trees for aabbccdd 


e We could also arrange that the numbers of a’s and b’s generated by A 
are not exactly equal, but off by some finite number. For instance, we 
could start with a production like S — AbB and then use A > aAb | a 
to generate one more a than b’s. 


However, we cannot avoid some mechanism for generating a’s in a way that 
matches the count of b’s. 

Likewise, we can argue that there must be a variable like B that generates 
matching c’s and d’s. Also, variables that play the roles of C (generate match- 
ing a’s and d’s) and D (generate matching b’s and c’s) must be available in 
the grammar. The argument, when formalized, proves that no matter what 
modifications we make to the basic grammar, it will generate at least some of 
the strings of the form a”b"c"d” in the two ways that the grammar of Fig. 5.22 
does. 


5.4.5 Exercises for Section 5.4 


Exercise 5.4.1: Consider the grammar 
S > as | aSbS |€ 
This grammar is ambiguous. Show in particular that the string aab has two: 
a) Parse trees. 
b) Leftmost derivations. 


c) Rightmost derivations. 


x1 


x1 
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Exercise 5.4.2: Prove that the grammar of Exercise 5.4.1 generates all and 
only the strings of a’s and b’s such that every prefix has at least as many a’s as 
b’s. 


! Exercise 5.4.3: Find an unambiguous grammar for the language of Exer- 


cise 5.4.1. 


! Exercise 5.4.4: Some strings of a’s and b’s have a unique parse tree in the 


grammar of Exercise 5.4.1. Give an efficient test to tell whether a given string 
is one of these. The test “try all parse trees to see how many yield the given 
string” is not adequately efficient. 


Exercise 5.4.5: This question concerns the grammar from Exercise 5.1.2, 
which we reproduce here: 


S > AIB 
A > OAle 
B +> OB|1Ble 


a) Show that this grammar is unambiguous. 


b) Find a grammar for the same language that is ambiguous, and demon- 
strate its ambiguity. 


! Exercise 5.4.6: Is your grammar from Exercise 5.1.5 unambiguous? If not, 


redesign it to be unambiguous. 


Exercise 5.4.7: The following grammar generates prefix expressions with 
operands x and y and binary operators +, —, and x: 


E>O+EE|*EE|—EE|zly 


a) Find leftmost and rightmost derivations, and a derivation tree for the 
string +*-xyxy. 


! b) Prove that this grammar is unambiguous. 


5.5 Summary of Chapter 5 


+ Context-Free Grammars: A CFG is a way of describing languages by 
recursive rules called productions. A CFG consists of a set of variables, a 
set of terminal symbols, and a start variable, as well as the productions. 
Each production consists of a head variable and a body consisting of a 
string of zero or more variables and/or terminals. 


+ Derivations and Languages: Beginning with the start symbol, we derive 
terminal strings by repeatedly replacing a variable by the body of some 
production with that variable in the head. The language of the CFG is 
the set of terminal strings we can so derive; it is called a context-free 
language. 
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+ Leftmost and Rightmost Derivations: If we always replace the leftmost 
(resp. rightmost) variable in a string, then the resulting derivation is a 
leftmost (resp. rightmost) derivation. Every string in the language of a 
CFG has at least one leftmost and at least one rightmost derivation. 


+ Sentential Forms: Any step in a derivation is a string of variables and/or 
terminals. We call such a string a sentential form. If the derivation is 
leftmost (resp. rightmost), then the string is a left- (resp. right-) sentential 
form. 


+ Parse Trees: A parse tree is a tree that shows the essentials of a derivation. 
Interior nodes are labeled by variables, and leaves are labeled by terminals 
or €. For each internal node, there must be a production such that the 
head of the production is the label of the node, and the labels of its 
children, read from left to right, form the body of that production. 


+ Equivalence of Parse Trees and Derivations: A terminal string is in the 
language of a grammar if and only if it is the yield of at least one parse 
tree. Thus, the existence of leftmost derivations, rightmost derivations, 
and parse trees are equivalent conditions that each define exactly the 
strings in the language of a CFG. 


+ Ambiguous Grammars: For some CFG’s, it is possible to find a terminal 
string with more than one parse tree, or equivalently, more than one left- 
most derivation or more than one rightmost derivation. Such a grammar 
is called ambiguous. 


+ Eliminating Ambiguity: For many useful grammars, such as those that 
describe the structure of programs in a typical programming language, 
it is possible to find an unambiguous grammar that generates the same 
language. Unfortunately, the unambiguous grammar is frequently more 
complex than the simplest ambiguous grammar for the language. There 
are also some context-free languages, usually quite contrived, that are 
inherently ambiguous, meaning that every grammar for that language is 
ambiguous. 


+ Parsers: The context-free grammar is an essential concept for the im- 
plementation of compilers and other programming-language processors. 
Tools such as YACC take a CFG as input and produce a parser, the com- 
ponent of a compiler that deduces the structure of the program being 
compiled. 


+ Document Type Definitions: The emerging XML standard for sharing 
information through Web documents has a notation, called the DTD, 
for describing the structure of such documents, through the nesting of 
semantic tags within the document. The DTD is in essence a context-free 
grammar whose language is a class of related documents. 
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5.6 Gradiance Problems for Chapter 5 


The following is a sample of problems that are available on-line through the 
Gradiance system at www.gradiance.com/pearson. Each of these problems 
is worked like conventional homework. The Gradiance system gives you four 
choices that sample your knowledge of the solution. If you make the wrong 
choice, you are given a hint or advice and encouraged to try the same problem 
again. 


Problem 5.1: Let G be the grammar: 
S—+>SS | (S) | 


L(G) is the language BP of all strings of balanced parentheses, that is, those 
strings that could appear in a well-formed arithmetic expression. We want to 
prove that L(G) = BP, which requires two inductive proofs: 


1. If w is in L(G), then w is in BP. 
2. If w isin BP, then w is in L(G). 


We shall here prove only the first. You will see below a sequence of steps in the 
proof, each with a reason left out. These reasons belong to one of three classes: 


A) Use of the inductive hypothesis. 


B) Reasoning about properties of grammars, e.g., that every derivation has 
at least one step. 


C) Reasoning about properties of strings, e.g., that every string is longer 
than any of its proper substrings. 


The proof is an induction on the number of steps in the derivation of w. You 
should decide on the reason for each step in the proof below, and then identify 
from the available choices a correct pair consisting of a step and a kind of reason 
(A, B, or C). 


Basis: One step. 
1. The only 1-step derivation of a terminal string is S > e because: 
2. eis in BP because: 

Induction: An n-step derivation for some n > 1. 
3. The derivation S =” w is either of the form 


a) S > SS >"! w or of the form 
b) S > (S) 3771 w 
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because: 
Case (a): 


4. w = xy, for some strings x and y such that S >? x and S =! y, where 
p<nandq < n because: 


5. x is in BP because: 
6. y isin BP because: 
7. w isin BP because: 
Case (b): 
8. w = (z) for some string z such that S +"—! z because: 
9. z is in BP because: 


10. w is in BP because: 
Problem 5.2: Let G be the grammar: 
SSS | (S) |e 


L(G) is the language BP of all strings of balanced parentheses, that is, those 
strings that could appear in a well-formed arithmetic expression. We want to 
prove that L(G) = BP, which requires two inductive proofs: 


1. If w is in L(G), then w is in BP. 
2. If w isin BP, then w is in L(G). 


We shall here prove only the second. You will see below a sequence of steps 
in the proof, each with a reason left out. These reasons belong to one of three 
classes: 


A) Use of the inductive hypothesis. 


B) Reasoning about properties of grammars, e.g., that every derivation has 
at least one step. 


C) Reasoning about properties of strings, e.g., that every string is longer 
than any of its proper substrings. 


The proof is an induction on the number of steps in the derivation of w. You 
should decide on the reason for each step in the proof below, and then identify 
from the available choices a correct pair consisting of a step and a kind of reason 
(A, B, or ©). 


Basis: Length = 0. 
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The only string of length 0 in BP is e because: 


cis in L(G) because: 


Induction: |w| =n > 0. 


3. 


11. 


w is of the form (x)y, where (x) is the shortest proper prefix of w that is 
in BP, and y is the remainder of w because: 


. © isin BP because: 

. y is in BP because: 

. |z| < n because: 

. |y| < n because: 

. x is in L(G) because: 
. y is in L(G) because: 


. (x) is in L(G) because: 


w is in L(G) because: 


Problem 5.3: Here are eight simple grammars, each of which generates an 
infinite language of strings. These strings tend to look like alternating a’s and 
b’s, although there are some exceptions, and not all grammars generate all such 


strings. 
1. S —> abS | ab 
2. S— SS | ab 
3. S3aB BobS|a 
4.S>aB B-+bS|b 
5. S3aB B->bS | ab 
6. S—>aB |b; Bobs 
7. S73 aBla Bobs 
8. S—>aB | ab; Bobs 


The initial symbol is S in all cases. Determine the language of each of these 
grammars. Then, find, in the list below, the pair of grammars that define the 
same language. 
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Problem 5.4: Consider the grammar G and the language L: 

G: S> AB|a|abC A>b CH abC lc 

L: {w | w a string of a’s, b’s, and c’s with an equal number of a’s and b’s} 

Grammar G does not define language L. To prove, we use a string that 
either is produced by G and not contained in L or is contained in L but is not 
produced by G. Which string can be used to prove it? 


Problem 5.5: Consider the grammars: 

Gi: S> ABla|abC A>b C>abC |c 

Go: Soa|b|cC C3cC|e 

These grammars do not define the same language. To prove, we use a string 
that is generated by one but not by the other grammar. Which of the following 
strings can be used for this proof? 


Problem 5.6: Consider the languge L = {a}. Which grammar defines L? 
Problem 5.7: Consider the grammars: 

Gı S>SaS|\a 

G2 S> SS |e 

Gs S>SS|a 

G4 S —> SS | aa 

Gs S — Sala 

Ge S — aSa | aa |a 

G7 S— SAS |e 


Describe the language of each of these grammars. Then, identify from the list 
below a pair of grammars that define the same language. 


Problem 5.8: Consider the following languages and grammars. 
Gı S > aAlaS, A > ab 
G2 S —> abS|aA,A7a 
G3 S > Sa| AB, A > aAla,B > b 
G4 S —> aS|b 
Lı {ab | i=1,2,...} 
Lə {(ab)taa | i =0,1,...} 
La {atb | i= 2,3,...} 
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La {abai | i= 1,2,...,j =0,1,...} 
Ls {atb | i =0,1,...} 


Match each grammar with the language it defines. Then, identify a correct 
match from the list below. 


Problem 5.9: Here is a context-free grammar G: 


S —> AB 
A =>041]|2 
B >1B | 3A 


Which of the following strings is in L(G)? 


Problem 5.10: Identify in the list below a sentence of length 6 that is gener- 
ated by the grammar: 


S + (S)S |e 
Problem 5.11: Consider the grammar G with start symbol S: 
S>bS|aA|b 
A->bA|aB 
B-bB|aS|a 
Which of the following is a word in L(G)? 


Problem 5.12: Here is a parse tree that uses some unknown grammar G 
[shown on-line by the Gradiance system]. Which of the following productions 
is surely one of those for grammar G? 


Problem 5.13: The parse tree below [shown on-line by the Gradiance system] 
represents a rightmost derivation according to the grammar 


S= AB A-aSla BobA 
Which of the following is a right-sentential form in this derivation? 
Problem 5.14: Consider the grammar: 
S — SS S—ab 


Identify in the list below the one set of parse trees which includes a tree that is 
not a parse tree of this grammar? 


Problem 5.15: Which of the parse trees below [shown on-line by the Gradi- 
ance system] yield the same word? 
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Problem 5.16: Programming languages are often described using an extended 

form of context-free grammar, where square brackets are used to denote an 

optional construct. For example, A => B[C]D says that an A can be replaced 

by a B and a D, with an optional C between them. This notation does not 

allow us to describe anything but context-free languages, since an extended 

production can always be replaced by several conventional productions. 
Suppose a grammar has the extended productions: 


A > U[VW]XY | UV[WX]Y 


[U,...,Y are strings that will be provided on-line by the Gradiance system.] 
Convert this pair of extended productions to conventional productions. Identify, 
from the list below, the conventional productions that are equivalent to the 
extended productions above. 


Problem 5.17: Programming languages are often described using an extended 
form of context-free grammar, where curly brackets are used to denote a con- 
struct that can repeat 0, 1, 2, or any number of times. For example, 4 —> 
B{C}D says that an A can be replaced by a B and a D, with any number of 
C’s (including 0) between them. This notation does not allow us to describe 
anything but context-free languages, since an extended production can always 
be replaced by several conventional productions. 
Suppose a grammar has the extended production: 


A> U{V}W 


[U, V, and W are strings that will be provided on-line by the Gradiance system. ] 
Convert this extended production to conventional productions. Identify, from 
the list below, the conventional productions that are equivalent to the extended 
production above. 


Problem 5.18: The grammar G: 
S>SS|al|b 


is ambiguous. That means at least some of the strings in its language have 
more than one leftmost derivation. However, it may be that some strings in the 
language have only one derivation. Identify from the list below a string that 
has exactly two leftmost derivations in G. 


Problem 5.19: This question concerns the grammar: 


S — AbB 
A>aAje 
B —>aB |bB |e 


Find a leftmost derivation of the string XbY [X and Y are strings that will 
be provided on-line by the Gradiance system]. Then, identify one of the left- 
sentential forms of this derivation from the list below. 
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5.7 References for Chapter 5 


The context-free grammar was first proposed as a description method for nat- 
ural languages by Chomsky [4]. A similar idea was used shortly thereafter to 
describe computer languages — Fortran by Backus [2] and Algol by Naur [7]. 
As aresult, CFG’s are sometimes referred to as “Backus-Naur form grammars.” 

Ambiguity in grammars was identified as a problem by Cantor [3] and Floyd 
[5] at about the same time. Inherent ambiguity was first addressed by Gross 
[6]. 

For applications of CFG’s in compilers, see [1]. DTD’s are defined in the 
standards document for XML [8]. 
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Chapter 6 


Pushdown Automata 


The context-free languages have a type of automaton that defines them. This 
automaton, called a “pushdown automaton,” is an extension of the nondeter- 
ministic finite automaton with e-transitions, which is one of the ways to define 
the regular languages. The pushdown automaton is essentially an e-NFA with 
the addition of a stack. The stack can be read, pushed, and popped only at the 
top, just like the “stack” data structure. 

In this chapter, we define two different versions of the pushdown automaton: 
one that accepts by entering an accepting state, like finite automata do, and 
another version that accepts by emptying its stack, regardless of the state it is in. 
We show that these two variations accept exactly the context-free languages; 
that is, grammars can be converted to equivalent pushdown automata, and 
vice-versa. We also consider briefly the subclass of pushdown automata that is 
deterministic. These accept all the regular languages, but only a proper subset 
of the CFL’s. Since they resemble closely the mechanics of the parser in a 
typical compiler, it is important to observe what language constructs can and 
cannot be recognized by deterministic pushdown automata. 


6.1 Definition of the Pushdown Automaton 


In this section we introduce the pushdown automaton, first informally, then as 
a formal construct. 


6.1.1 Informal Introduction 


The pushdown automaton is in essence a nondeterministic finite automaton 
with €-transitions permitted and one additional capability: a stack on which it 
can store a string of “stack symbols.” The presence of a stack means that, unlike 
the finite automaton, the pushdown automaton can “remember” an infinite 
amount of information. However, unlike a general-purpose computer, which 
also has the ability to remember arbitrarily large amounts of information, the 
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pushdown automaton can only access the information on its stack in a last-in- 
first-out way. 

As a result, there are languages that could be recognized by some computer 
program, but are not recognizable by any pushdown automaton. In fact, push- 
down automata recognize all and only the context-free languages. While there 
are many languages that are context-free, including some we have seen that are 
not regular languages, there are also some simple-to-describe languages that are 
not context-free, as we shall see in Section 7.2. An example of a non-context- 
free language is {0"1"2” | n > 1}, the set of strings consisting of equal groups 
of 0’s, 1’s, and 2’s. 


Input ——» state H = Accept/reject 


Stack 


Figure 6.1: A pushdown automaton is essentially a finite automaton with a 
stack data structure 


We can view the pushdown automaton informally as the device suggested 
in Fig. 6.1. A “finite-state control” reads inputs, one symbol at a time. The 
pushdown automaton is allowed to observe the symbol at the top of the stack 
and to base its transition on its current state, the input symbol, and the symbol 
at the top of stack. Alternatively, it may make a “spontaneous” transition, using 
€ as its input instead of an input symbol. In one transition, the pushdown 
automaton: 


1. Consumes from the input the symbol that it uses in the transition. If € is 
used for the input, then no input symbol is consumed. 


2. Goes to a new state, which may or may not be the same as the previous 
state. 


3. Replaces the symbol at the top of the stack by any string. The string 
could be €, which corresponds to a pop of the stack. It could be the same 
symbol that appeared at the top of the stack previously; i.e., no change 
to the stack is made. It could also replace the top stack symbol by one 
other symbol, which in effect changes the top of the stack but does not 
push or pop it. Finally, the top stack symbol could be replaced by two or 
more symbols, which has the effect of (possibly) changing the top stack 
symbol, and then pushing one or more new symbols onto the stack. 


Example 6.1: Let us consider the language 
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Lwwr = {ww® | w is in (0 + 1)*} 


This language, often referred to as “w-w-reversed,” is the even-length palin- 
dromes over alphabet {0,1}. It is a CFL, generated by the grammar of Fig. 5.1, 
with the productions P — 0 and P > 1 omitted. 

We can design an informal pushdown automaton accepting Lwwr, as fol- 
lows.! 


1. Start in a state qo that represents a “guess” that we have not yet seen the 
middle; i.e., we have not seen the end of the string w that is to be followed 
by its own reverse. While in state gg, we read symbols and store them on 
the stack, by pushing a copy of each input symbol onto the stack, in turn. 


2. At any time, we may guess that we have seen the middle, i.e., the end of 
w. At this time, w will be on the stack, with the right end of w at the top 
and the left end at the bottom. We signify this choice by spontaneously 
going to state qı. Since the automaton is nondeterministic, we actually 
make both guesses: we guess we have seen the end of w, but we also stay 
in state go and continue to read inputs and store them on the stack. 


3. Once in state qı, we compare input symbols with the symbol at the top 
of the stack. If they match, we consume the input symbol, pop the stack, 
and proceed. If they do not match, we have guessed wrong; our guessed 
w was not followed by wF. This branch dies, although other branches 
of the nondeterministic automaton may survive and eventually lead to 
acceptance. 


4. If we empty the stack, then we have indeed seen some input w followed 
by w®. We accept the input that was read up to this point. 


6.1.2 The Formal Definition of Pushdown Automata 


Our formal notation for a pushdown automaton (PDA) involves seven compo- 
nents. We write the specification of a PDA P as follows: 


P= (Q, 4,1, ô, qo, Zo, F) 
The components have the following meanings: 


Q: A finite set of states, like the states of a finite automaton. 


©: A finite set of input symbols, also analogous to the corresponding compo- 
nent of a finite automaton. 
1We could also design a pushdown automaton for Lpai, which is the language whose 


grammar appeared in Fig. 5.1. However, Lwwr is slightly simpler and will allow us to focus 
on the important ideas regarding pushdown automata. 
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No “Mixing and Matching” 


There may be several pairs that are options for a PDA in some situation. 
For instance, suppose ô(q,a, X) = {(p, YZ), (r,e)}. When making a move 
of the PDA, we have to choose one pair in its entirety; we cannot pick a 
state from one and a stack-replacement string from another. Thus, in state 
q, with X on the top of the stack, reading input a, we could go to state p 
and replace X by Y Z, or we could go to state r and pop X. However, we 
cannot go to state p and pop X, and we cannot go to state r and replace 
X by YZ. 


T: A finite stack alphabet. This component, which has no finite-automaton 
analog, is the set of symbols that we are allowed to push onto the stack. 


6: The transition function. As for a finite automaton, 6 governs the behavior 
of the automaton. Formally, 6 takes as argument a triple (q, a, X), where: 


1. qis a state in Q. 


2. ais either an input symbol in ÈE or a = e, the empty string, which is 
assumed not to be an input symbol. 


3. X is a stack symbol, that is, a member of T. 


The output of ô is a finite set of pairs (p, y), where pis the new state, and 
y is the string of stack symbols that replaces X at the top of the stack. 
For instance, if y = €, then the stack is popped, if y = X, then the stack 
is unchanged, and if y = Y Z, then X is replaced by Z, and Y is pushed 
onto the stack. 


qo: The start state. The PDA is in this state before making any transitions. 


Zo: The start symbol. Initially, the PDA’s stack consists of one instance of 
this symbol, and nothing else. 


F: The set of accepting states, or final states. 


Example 6.2: Let us design a PDA P to accept the language Lwwr of Exam- 
ple 6.1. First, there are a few details not present in that example that we need 
to understand in order to manage the stack properly. We shall use a stack sym- 
bol Zo to mark the bottom of the stack. We need to have this symbol present 
so that, after we pop w off the stack and realize that we have seen ww” on the 
input, we still have something on the stack to permit us to make a transition 
to the accepting state, q2. Thus, our PDA for Lwwr can be described as 


P= ({q0, m1, 492}, {0, 1}, {0, 1, Zo}, ô, qo; Zo, {q2}) 


where 6 is defined by the following rules: 
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1. 6(qo,0, Zo) = {(q0, 0Z0)} and ô(qo, 1, Zo) = {(q0, LZo)}- One of these 
rules applies initially, when we are in state go and we see the start symbol 
Zo at the top of the stack. We read the first input, and push it onto the 
stack, leaving Zo below to mark the bottom. 


2. ô(qo, 0,0) = {(q0, 00)}, ô(qo, 0, 1) = {(q0,01)}, ô(qo, 1,0) = {(qo, 10)}, and 
ô(qo, 1, 1) = {(q0, 11)}. These four, similar rules allow us to stay in state 
qo and read inputs, pushing each onto the top of the stack and leaving 
the previous top stack symbol alone. 


3. 5(40, €; Zo) = {(q, Zo)}; ô(qo, €, 0) = {(q1,0)}, and ô(qo,€, 1) = {(q, 1)}. 
These three rules allow P to go from state qo to state qı spontaneously 
(on € input), leaving intact whatever symbol is at the top of the stack. 


4. 6(m,0,0) = {(q1,€)}, and 6(q1, 1,1) = {(q1,€)}. Now, in state q we can 
match input symbols against the top symbols on the stack, and pop when 
the symbols match. 


5. lq, €, Zo) = {(q2, Zo) }. Finally, if we expose the bottom-of-stack marker 


Zo and we are in state qı, then we have found an input of the form ww®. 


We go to state q2 and accept. 


6.1.3 A Graphical Notation for PDA’s 


The list of 6 facts, as in Example 6.2, is not too easy to follow. Sometimes, a 
diagram, generalizing the transition diagram of a finite automaton, will make 
aspects of the behavior of a given PDA clearer. We shall therefore introduce 
and subsequently use a transition diagram for PDA’s in which: 


a) The nodes correspond to the states of the PDA. 


b) An arrow labeled Start indicates the start state, and doubly circled states 
are accepting, as for finite automata. 


c) The arcs correspond to transitions of the PDA in the following sense. An 
arc labeled a, X/a from state q to state p means that 6(q,a, X) contains 
the pair (p,a), perhaps among other pairs. That is, the arc label tells 
what input is used, and also gives the old and new tops of the stack. 


The only thing that the diagram does not tell us is which stack symbol is the 
start symbol. Conventionally, it is Zo, unless we indicate otherwise. 


Example 6.3: The PDA of Example 6.2 is represented by the diagram shown 
in Fig. 6.2. 
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0, Zo/0 Zo 
1, Zo/l Zo 
0, 0/00 
0, 1/01 
1, 0/10 0,0/¢e 
1, 1/11 1,1/€e 
sat (_) C) 
q q 
0) £, ray En ZO 
€, 
£, 1 


Figure 6.2: Representing a PDA as a generalized transition diagram 


6.1.4 Instantaneous Descriptions of a PDA 


To this point, we have only an informal notion of how a PDA “computes.” Intu- 
itively, the PDA goes from configuration to configuration, in response to input 
symbols (or sometimes €), but unlike the finite automaton, where the state is 
the only thing that we need to know about the automaton, the PDA’s config- 
uration involves both the state and the contents of the stack. Being arbitrarily 
large, the stack is often the more important part of the total configuration of 
the PDA at any time. It is also useful to represent as part of the configuration 
the portion of the input that remains. 

Thus, we shall represent the configuration of a PDA by a triple (q,w,7), 
where 


1. q is the state, 
2. w is the remaining input, and 


3. 7 is the stack contents. 


Conventionally, we show the top of the stack at the left end of y and the bottom 
at the right end. Such a triple is called an instantaneous description, or ID, of 
the pushdown automaton. 

For finite automata, the ô notation was sufficient to represent sequences 
of instantaneous descriptions through which a finite automaton moved, since 
the ID for a finite automaton is just its state. However, for PDA’s we need a 
notation that describes changes in the state, the input, and stack. Thus, we 
adopt the “turnstile” notation for connecting pairs of ID’s that represent one 
or many moves of a PDA. 

Let P = (Q,»,T,6,¢0, Zo, F) be a PDA. Define E, or just F when P is 


understood, as follows. Suppose ô(q,a, X) contains (p,a). Then for all strings 
win ©* and @ in [*: 
(q aw, XB) F (p, w, aß) 
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This move reflects the idea that, by consuming a (which may be €) from the 
input and replacing X on top of the stack by a, we can go from state q to state 
p. Note that what remains on the input, w, and what is below the top of the 
stack, 3, do not influence the action of the PDA; they are merely carried along, 
perhaps to influence events later. 

We also use the symbol F , or Ë when the PDA P is understood, to represent 


zero or more moves of the PDA. That is: 

BASIS: I Ë I for any ID I. 

INDUCTION: I F J if there exists some ID K such that I+ K and KF J. 
That is, I F J if there is a sequence of ID’s Ki, Ko,..., Kn such that J = ky, 
J = Kn, and for alli = 1,2,...,n — 1, we have K; F Kj41. 


Example 6.4: Let us consider the action of the PDA of Example 6.2 on the 
input 1111. Since go is the start state and Zp is the start symbol, the initial ID 
is (qo, 1111, Zo). On this input, the PDA has an opportunity to guess wrongly 
several times. The entire sequence of ID’s that the PDA can reach from the 
initial ID (go, 1111, Zo) is shown in Fig. 6.3. Arrows represent the | relation. 


Ca Til} 


— 


(as 111,125) Ca 111,29) ~ (%, 1111,Z5) 


n 


Cais 11, 11Z9 ) Ca 111,120) ™ (q +11, Zo) 


mg 


(ays IZ Ca» 11 126) Ca 11, Zo) 


ake he 


Cag € 1Z9) Cages LIHIZo) Ca L1Zo) 


Ca € 11Z9) (q> €,11Zo) CE 


Figure 6.3: ID’s of the PDA of Example 6.2 on input 1111 
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Notational Conventions for PDA’s 


We shall continue using conventions regarding the use of symbols that 
we introduced for finite automata and grammars. In carrying over the 
notation, it is useful to realize that the stack symbols play a role analogous 
to the union of the terminals and variables in a CFG. Thus: 


1. Symbols of the input alphabet will be represented by lower-case let- 
ters near the beginning of the alphabet, e.g., a, b. 


. States will be represented by q and p, typically, or other letters that 
are nearby in alphabetical order. 


. Strings of input symbols will be represented by lower-case letters 
near the end of the alphabet, e.g., w or z. 


. Stack symbols will be represented by capital letters near the end of 
the alphabet, e.g., X or Y. 


. Strings of stack symbols will be represented by Greek letters, e.g., a 
or y. 


From the initial ID, there are two choices of move. The first guesses that 
the middle has not been seen and leads to ID (qo, 111,12). In effect, a 1 has 
been removed from the input and pushed onto the stack. 

The second choice from the initial ID guesses that the middle has been 
reached. Without consuming input, the PDA goes to state qı, leading to the 
ID (qı, 1111, Zo). Since the PDA may accept if it is in state qı and sees Zo on 
top of its stack, the PDA goes from there to ID (q,1111, Zo). That ID is not 
exactly an accepting ID, since the input has not been completely consumed. 
Had the input been e€ rather than 1111, the same sequence of moves would have 
led to ID (q@,€, Zo), which would show that € is accepted. 

The PDA may also guess that it has seen the middle after reading one 1, that 
is, when it is in the ID (qo,111,1Zo). That guess also leads to failure, since 
the entire input cannot be consumed. The correct guess, that the middle is 
reached after reading two 1’s, gives us the sequence of ID’s (qo, 1111, Zo) F 
(qo,111,1%) F (qo,11,11%) F (q1,11,11Zo) F (q1,1,14Z0) F (q1,€, Zo) F 
(q2, €, Zo). 


There are three important principles about ID’s and their transitions that 
we shall need in order to reason about PDA’s: 


1. If a sequence of ID’s (computation) is legal for a PDA P, then the com- 
putation formed by adding the same additional input string to the end of 
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the input (second component) in each ID is also legal. 


2. If a computation is legal for a PDA P, then the computation formed by 
adding the same additional stack symbols below the stack in each ID is 
also legal. 


3. If a computation is legal for a PDA P, and some tail of the input is not 
consumed, then we can remove this tail from the input in each ID, and 
the resulting computation will still be legal. 


Intuitively, data that P never looks at cannot affect its computation. We for- 
malize points (1) and (2) in a single theorem. 


Theorem 6.5: If P = (Q,¥,T,6,q0, Zo, F) is a PDA, and (q, x,a) É (p, y, B), 
then for any strings w in ©* and y in T*, it is also true that 


(q, zw,ay) 5 (p, yw, BY) 


Note that if y = €, then we have a formal statement of principle (1) above, and 
if w = €e, then we have the second principle. 


PROOF: The proof is actually a very simple induction on the number of steps 
in the sequence of ID’s that take (q,2w, ay) to (p, yw, Gy). Each of the moves 
in the sequence (q, x, a) F (p,y, 3) is justified by the transitions of P without 


using w and/or y in any way. Therefore, each move is still justified when these 
strings are sitting on the input and stack. 


Incidentally, note that the converse of this theorem is false. There are things 
that a PDA might be able to do by popping its stack, using some symbols of y, 
and then replacing them on the stack, that it couldn’t do if it never looked at 
y. However, as principle (3) states, we can remove unused input, since it is not 
possible for a PDA to consume input symbols and then restore those symbols 
to the input. We state principle (3) formally as: 


Theorem 6.6: If P = (Q,¥,T,6,q0, Zo, F) is a PDA, and 


(q, cw, a) > (p, yw, B) 


then it is also true that (q, x, a) - (p,y,ß). 


6.1.5 Exercises for Section 6.1 


Exercise 6.1.1: Suppose the PDA P = (4q, p}, {0,1}, {Z0, X}, 6, q, Zo, {p}) 
has the following transition function: 


1. ô(q,0, Zo) = {(4, X Zo) }- 
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ID’s for Finite Automata? 


One might wonder why we did not introduce for finite automata a notation 
like the ID’s we use for PDA’s. Although a FA has no stack, we could use 
a pair (q,w), where q is the state and w the remaining input, as the ID of 
a finite automaton. 

While we could have done so, we would not glean any more informa- 
tion from reachability among ID’s than we obtain from the 6 notation. 
That is, for any finite automaton, we could show that 6(q,w) = p if and 
only if (q,wx) Ë (p, x) for all strings x. The fact that x can be anything 
we wish without influencing the behavior of the FA is a theorem analogous 
to Theorems 6.5 and 6.6. 


6. ô(p,1, X) = 


T. ô , 1, Zo) = {(p, €)}. 


Starting from the initial ID (q,w, Zo), show all the reachable ID’s when the 
input w is: 


* a) 01. 
b) 0011. 
c) 010. 


6.2 The Languages of a PDA 


We have assumed that a PDA accepts its input by consuming it and entering 
an accepting state. We call this approach “acceptance by final state.” There 
is a second approach to defining the language of a PDA that has important 
applications. We may also define for any PDA the language “accepted by 
empty stack,” that is, the set of strings that cause the PDA to empty its stack, 
starting from the initial ID. 

These two methods are equivalent, in the sense that a language L has a 
PDA that accepts it by final state if and only if L has a PDA that accepts it 
by empty stack. However, for a given PDA P, the languages that P accepts 
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by final state and by empty stack are usually different. We shall show in this 
section how to convert a PDA accepting L by final state into another PDA that 
accepts L by empty stack, and vice-versa. 


6.2.1 Acceptance by Final State 


Let P = (Q, £, TI, ô, qo, Zo, F) be a PDA. Then L(P), the language accepted by 
P by final state, is 


{w | (qo, w, Zo) È (q, €,a)} 


for some state q in F and any stack string a. That is, starting in the initial 
ID with w waiting on the input, P consumes w from the input and enters an 
accepting state. The contents of the stack at that time is irrelevant. 


Example 6.7: We have claimed that the PDA of Example 6.2 accepts the 
language Lwwr, the language of strings in {0,1}* that have the form ww®. Let 
us see why that statement is true. The proof is an if-and-only-if statement: the 
PDA P of Example 6.2 accepts string x by final state if and only if x is of the 
form ww*. 

(If) This part is easy; we have only to show the accepting computation of P. If 
x = ww, then observe that 


(qo, ww, Zo) F (qo, w®, w? Zo) F (q, w?, w” Zo) F (q€, Zo) k (q2, €, Zo) 


That is, one option the PDA has is to read w from its input and store it on its 
stack, in reverse. Next, it goes spontaneously to state qı and matches w? on 
the input with the same string on its stack, and finally goes spontaneously to 
state q2. 


(Only-if) This part is harder. First, observe that the only way to enter accepting 
state q2 is to be in state qı and have Zo at the top of the stack. Also, any 
accepting computation of P will start in state qo, make one transition to q1, 
and never return to go. Thus, it is sufficient to find the conditions on x such 
that (qo, £, Zo) F (q1, €, Zo); these will be exactly the strings x that P accepts 
by final state. We shall show by induction on |x| the slightly more general 
statement: 


e If (qo, x, a) č (q1,€,@), then z is of the form ww®. 


BASIS: If x = €, then z is of the form ww? (with w = e). Thus, the conclusion is 
true, so the statement is true. Note we do not have to argue that the hypothesis 
(qo,€,@) F (q1,€,@) is true, although it is. 


INDUCTION: Suppose z = a1đ2'''an for some n > 0. There are two moves 
that P can make from ID (qo, x, a): 
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1. (qo,2,a) F (q1, £,a). Now P can only pop the stack when it is in state 
qı. P must pop the stack with every input symbol it reads, and |z| > 0. 
Thus, if (q1,2,a@) Č (q1,€, 8), then will be shorter than a and cannot 
be equal to a. 


2. (qo, @102°*+Gn,@) (qo, a2: an, aia). Now the only way a sequence of 
moves can end in (q1, €,@) is if the last move is a pop: 


(q, an,aia) F (qi, €, a) 


In that case, it must be that a; = an. We also know that 


(qo, 42 *++Gn,a1a)F (q1,Gn, aa) 


By Theorem 6.6, we can remove the symbol a, from the end of the input, 
since it is not used. Thus, 


(qo, G2 as ‘an—1, 41a) F (q1,€,41@) 


Since the input for this sequence is shorter than n, we may apply the 
inductive hypothesis and conclude that az +--an—1 is of the form yy” for 
some y. Since « = a,yy"an, and we know a, = an, we conclude that x is 
of the form ww®; specifically w = ayy. 


The above is the heart of the proof that the only way to accept x is for x 
to be equal to ww” for some w. Thus, we have the “only-if” part of the proof, 
which, with the “if” part proved earlier, tells us that P accepts exactly those 
strings in Lwwr- 


6.2.2 Acceptance by Empty Stack 
For each PDA P = (Q,»,T,6,q0, Zo, F), we also define 


N(P) = {w | (qo, w, Zo) Ë (q, €, €)} 


for any state q. That is, N(P) is the set of inputs w that P can consume and 
at the same time empty its stack.” 


Example 6.8: The PDA P of Example 6.2 never empties its stack, so N(P) = 
Ø. However, a small modification will allow P to accept Lwwr by empty stack 
as well as by final state. Instead of the transition 6(q@,€, Zo) = {(q2, Zo) }, use 
ô(q1, €, Zo) = {(q2,€)}. Now, P pops the last symbol off its stack as it accepts, 
and L(P) = N(P) = Lwwr. 


Since the set of accepting states is irrelevant, we shall sometimes leave off 
the last (seventh) component from the specification of a PDA P, if all we care 
about is the language that P accepts by empty stack. Thus, we would write P 
as a six-tuple (Q, £, T, ô, qo, Zo). 


?The N in N(P) stands for “null stack,” a synonym for “empty stack.” 
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6.2.3 From Empty Stack to Final State 


We shall show that the classes of languages that are L(P) for some PDA P is 
the same as the class of languages that are N(P) for some PDA P. This class 
is also exactly the context-free languages, as we shall see in Section 6.3. Our 
first construction shows how to take a PDA Py that accepts a language L by 
empty stack and construct a PDA Pr that accepts L by final state. 


Theorem 6.9: If L = N(Py) for some PDA Py = (Q,»,T,6n,q0, Zo), then 
there is a PDA Pr such that L = L(Pp). 


PROOF: The idea behind the proof is in Fig. 6.4. We use a new symbol Xo, 
which must not be a symbol of T; Xo is both the start symbol of Pr and a 
marker on the bottom of the stack that lets us know when Py has reached an 
empty stack. That is, if Pr sees Xo on top of its stack, then it knows that Py 
would empty its stack on the same input. 


£, Xo 


€, X 0 /E 
Figure 6.4: Pr simulates Py and accepts if Py empties its stack 


We also need a new start state, po, whose sole function is to push Zo, the 
start symbol of Py, onto the top of the stack and enter state go, the start 
state of Py. Then, Pr simulates Py, until the stack of Py is empty, which Pr 
detects because it sees Xo on the top of the stack. Finally, we need another 
new state, pr, which is the accepting state of Pr; this PDA transfers to state 
pr whenever it discovers that Py would have emptied its stack. 

The specification of Pr is as follows: 


Pr = (Q U {po, pr}, ET U {Xo}, Or, po, Xo, {pr} 
where ôr is defined by: 


1. ôF (po, €, Xo) = {(q0, ZoXo)}. In its start state, Pr makes a spontaneous 
transition to the start state of Py, pushing its start symbol Zo onto the 
stack. 
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2. For all states q in Q, inputs a in ÈE or a = e, and stack symbols Y in TI, 
Or(q,a, Y) contains all the pairs in dn(q,a,Y). 


3. In addition to rule (2), ôr (q, €, Xo) contains (pr, €) for every state q in Q. 


We must show that w is in L(Pr) if and only if w is in N(Py). 


(If) We are given that (qo,w,Zo) Ë (q,€,€) for some state q. Theorem 6.5 lets 
Pyn 


us insert Xo at the bottom of the stack and conclude (qo, w, Zo Xo) fs (q, €, Xo). 
N 
Since by rule (2) above, Pr has all the moves of Py, we may also conclude that 


(qo, w, ZoXo) F (q,€, Xo). If we put this sequence of moves together with the 


initial and final moves from rules (1) and (3) above, we get: 


(po, w, Xo) E (qo, w, ZoXo) z (q, €, Xo) a (pf E, €) (6.1) 


F F F 


Thus, Pr accepts w by final state. 


(Only-if) The converse requires only that we observe the additional transitions 
of rules (1) and (3) give us very limited ways to accept w by final state. We must 
use rule (3) at the last step, and we can only use that rule if the stack of Pr 
contains only Xo. No Xo’s ever appear on the stack except at the bottommost 
position. Further, rule (1) is only used at the first step, and it must be used at 
the first step. 

Thus, any computation of Pr that accepts w must look like sequence (6.1). 
Moreover, the middle of the computation — all but the first and last steps — 
must also be a computation of Py with Xo below the stack. The reason is that, 
except for the first and last steps, Pr cannot use any transition that is not also 
a transition of Py, and Xo cannot be exposed or the computation would end at 
the next step. We conclude that (qo, w, Zo) 5 (q,€,€). That is, w isin N(Py). 

N 


Example 6.10: Let us design a PDA that processes sequences of if’s and 
else’s in a C program, where 7 stands for if and e stands for else. Recall 
from Section 5.3.1 that there is a problem whenever the number of else’s in 
any prefix exceeds the number of if’s, because then we cannot match each 
else against its previous if. Thus, we shall use a stack symbol Z to count the 
difference between the number of i’s seen so far and the number of e’s. This 
simple, one-state PDA, is suggested by the transition diagram of Fig. 6.5. 

We shall push another Z whenever we see an i and pop a Z whenever we see 
an e. Since we start with one Z on the stack, we actually follow the rule that if 
the stack is Z”, then there have been n — 1 more i’s than e’s. In particular, if 
the stack is empty, then we have seen one more e than i, and the input read so 
far has just become illegal for the first time. It is these strings that our PDA 
accepts by empty stack. The formal specification of Py is: 


Py = ({a}, {i,e}, {Z}, Nn, 4; Z) 
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Figure 6.5: A PDA that accepts the if/else errors by empty stack 


where ôy is defined by: 


1. ôn(q,i, Z) = {(q, 7Z)}. This rule pushes a Z when we see an i. 


2. ôn(q,e, Z) = {(¢,€)}. This rule pops a Z when we see an e. 


e, Z/ € 
i, Z/ZZ 


sart E EX jks = Ex eve 


Figure 6.6: Construction of a PDA accepting by final state from the PDA of 
Fig. 6.5 


Now, let us construct from Py a PDA Pr that accepts the same language 
by final state; the transition diagram for Pr is shown in Fig. 6.6.2 We introduce 
a new start state p and an accepting state r. We shall use Xo as the bottom- 
of-stack marker. Pr is formally defined: 


Pp = ({p,q; r}, {i, e}, {Z, Xo}, ÔF, p, Xo, {r} 


where ôr consists of: 


1. 


Or(p,€,Xo0) = {(q,7Xo)}. This rule starts Pr simulating Py, with Xo as 
a bottom-of-stack-marker. 


. ôr(q,i, Z) = {(q,7Z)}. This rule pushes a Z when we see an å; it simu- 


lates Py. 


. ôr(q,e, Z) = {(qg,€)}. This rule pops a Z when we see an e; it also 


simulates Py. 


. Or(q,€, Xo) = {(r,€)}. That is, Pr accepts when the simulated Py would 


have emptied its stack. 


3Do not be concerned that we are using new states p and r here, while the construction 
in Theorem 6.9 used po and pz. Names of states are arbitrary, of course. 
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6.2.4 From Final State to Empty Stack 


Now, let us go in the opposite direction: take a PDA Pr that accepts a language 
L by final state and construct another PDA Py that accepts L by empty stack. 
The construction is simple and is suggested in Fig. 6.7. From each accepting 
state of Pr, add a transition on € to a new state p. When in state p, Py pops its 
stack and does not consume any input. Thus, whenever Pr enters an accepting 
state after consuming input w, Py will empty its stack after consuming w. 

To avoid simulating a situation where Pr accidentally empties its stack 
without accepting, Py must also use a marker Xo on the bottom of its stack. 
The marker is Py’s start symbol, and like the construction of Theorem 6.9, Py 
must start in a new state po, whose sole function is to push the start symbol of 
Pr on the stack and go to the start state of Pr. The construction is sketched 
in Fig. 6.7, and we give it formally in the next theorem. 


Figure 6.7: Py simulates Pr and empties its stack when and only when Py 
enters an accepting state 


Theorem 6.11: Let L be L(Pr) for some PDA Pr = (Q, £,T, ôF, qo, Zo, F). 
Then there is a PDA Py such that L = N(Pw). 


PROOF: The construction is as suggested in Fig. 6.7. Let 


Pn = (Q U {po, p}, 2, T U {X0}, ôn, po, Xo) 
where ôy is defined by: 


1. ôn (po, €, Xo) = { (q0, ZoXo)}. We start by pushing the start symbol of Pr 
onto the stack and going to the start state of Pr. 


2. For all states q in Q, input symbols ain © or a = e, and Y inT, ôn (q,a, Y) 
contains every pair that is in dr(q,a,Y). That is, Py simulates Pr. 


3. For all accepting states q in F and stack symbols Y in T or Y = Xo, 
On(q,€,Y) contains (p,e). By this rule, whenever Pr accepts, Py can 
start emptying its stack without consuming any more input. 


4. For all stack symbols Y in T or Y = Xo, ôn (p,e, Y) = {(p,€)}. Once in 
state p, which only occurs when Pr has accepted, Py pops every symbol 
on its stack, until the stack is empty. No further input is consumed. 
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Now, we must prove that w is in N(Py) if and only if w is in L(Pp). 
The ideas are similar to the proof for Theorem 6.9. The “if” part is a direct 
simulation, and the “only-if” part requires that we examine the limited number 
of things that the constructed PDA Py can do. 


(If) Suppose (qo, w, Zo) F (q,€,@) for some accepting state q and stack string 


a. Using the fact that every transition of Pr is a move of Py, and invoking 
Theorem 6.5 to allow us to keep Xo below the symbols of IF on the stack, we 
know that (qo, w, ZoXo) F (q,€,@Xo). Then Py can do the following: 


N 


(po, Ww, Xo) F (qo, Ww, ZoXo) F (q, E, aXo) F (p, €, €) 
Py Pyn Py 
The first move is by rule (1) of the construction of Py, while the last sequence 
of moves is by rules (3) and (4). Thus, w is accepted by Py, by empty stack. 


(Only-if) The only way Py can empty its stack is by entering state p, since 
Xo is sitting at the bottom of stack and Xo is not a symbol on which Pr has 
any moves. The only way Py can enter state p is if the simulated Pr enters 
an accepting state. The first move of Py is surely the move given in rule (1). 
Thus, every accepting computation of Py looks like 


(po, Ww, Xo) E (qo, Ww, ZoXo) F (q, E, aXo) F (p, E, €) 
Py Pn Py 
where q is an accepting state of Pr. 

Moreover, between ID’s (qo, w, ZoXo) and (q,€,a@Xo), all the moves are 
moves of Pr. In particular, Xo was never the top stack symbol prior to reaching 
ID (q,€,a@Xo).4 Thus, we conclude that the same computation can occur in Pr, 
without the Xo on the stack; that is, (qo, w, Zo) P (q,€,a). Now we see that 

F 


Pr accepts w by final state, so w is in L(Pp). 


6.2.5 Exercises for Section 6.2 


Exercise 6.2.1: Design a PDA to accept each of the following languages. You 
may accept either by final state or by empty stack, whichever is more convenient. 


* a) {001° |n > 1}. 


b) The set of all strings of 0’s and 1’s such that no prefix has more 1’s than 
0’s. 


c) The set of all strings of 0’s and 1’s with an equal number of 0’s and 1’s. 
Exercise 6.2.2: Design a PDA to accept each of the following languages. 


* a) {a’bick | i= j or j = k}. Note that this language is different from that 
of Exercise 5.1.1(b). 


4 Although a could be e, in which case Pp has emptied its stack at the same time it accepts. 


* 


* 
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b) The set of all strings with twice as many 0’s as 1’s. 


Exercise 6.2.3: Design a PDA to accept each of the following languages. 
a) {abit |i 49 or j Fk}. 


b) The set of all strings of a’s and b’s that are not of the form ww, that is, 
not equal to any string repeated. 


Exercise 6.2.4: Let P be a PDA with empty-stack language L = N(P), and 
suppose that € is not in L. Describe how you would modify P so that it accepts 
LU {e} by empty stack. 


Exercise 6.2.5: PDA P= ({q0; q1, q2, 03, f}, {a, b}, {Zo, A, B}, ô, qo, Zo, {f}) 
has the following rules defining ô: 


6(q0,4, Zo) = (q1, AAZ) 6(qo,b, Zo) = (G2,BZ0) &lqo, €, Zo) = (f, €) 
6(q1,a, A) = (qı, AAA) 6(q,6, A) = (m1, €) ô(q1, €, Zo) = (qo, Zo) 
ô(q2,a, B) = (q3,€) ô(q2,b, B) = (q2, BB)  ô(q2,€, Zo) = (qo, Zo) 
ô(q3, €, B) = (q2,€) ô(q3, €, Zo) = (q1, AZo) 


Note that, since each of the sets above has only one choice of move, we have 
omitted the set brackets from each of the rules. 


* a) Give an execution trace (sequence of ID’s) showing that string bab is in 
L(P). 


b) Give an execution trace showing that abb is in L(P). 
c) Give the contents of the stack after P has read b’a* from its input. 


! d) Informally describe L(P). 


Exercise 6.2.6: Consider the PDA P from Exercise 6.1.1. 


a) Convert P to another PDA P, that accepts by empty stack the same 
language that P accepts by final state; i.e., N(P,) = L(P). 


b) Find a PDA P such that L(P:.) = N(P); i.e., P> accepts by final state 
what P accepts by empty stack. 


Exercise 6.2.7: Show that if P is a PDA, then there is a PDA P» with only 
two stack symbols, such that L(P2) = L(P). Hint: Binary-code the stack 
alphabet of P. 


Exercise 6.2.8: A PDA is called restricted if on any transition it can increase 
the height of the stack by at most one symbol. That is, for any rule 6(q, a, Z) 
contains (p, y), it must be that |y| < 2. Show that if P is a PDA, then there is 
a restricted PDA P; such that L(P) = L(P3). 
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6.3 Equivalence of PDA’s and CFG’s 


Now, we shall demonstrate that the languages defined by PDA’s are exactly the 
context-free languages. The plan of attack is suggested by Fig. 6.8. The goal 
is to prove that the following three classes of languages: 


1. The context-free languages, i.e., the languages defined by CFG’s. 
2. The languages that are accepted by final state by some PDA. 
3. The languages that are accepted by empty stack by some PDA. 


are all the same class. We have already shown that (2) and (3) are the same. 
It turns out to be easiest next to show that (1) and (3) are the same, thus 
implying the equivalence of all three. 


ae 


PDA by 
empty stack 


PDA by 


Grammar final state 


Figure 6.8: Organization of constructions showing equivalence of three ways of 
defining the CFL’s 


6.3.1 From Grammars to Pushdown Automata 


Given a CFG G, we construct a PDA that simulates the leftmost derivations 
of G. Any left-sentential form that is not a terminal string can be written as 
zAa, where A is the leftmost variable, x is whatever terminals appear to its 
left, and a is the string of terminals and variables that appear to the right of A. 
We call Aa the tail of this left-sentential form. If a left-sentential form consists 
of terminals only, then its tail is e. 

The idea behind the construction of a PDA from a grammar is to have 
the PDA simulate the sequence of left-sentential forms that the grammar uses 
to generate a given terminal string w. The tail of each sentential form «Aa 
appears on the stack, with A at the top. At that time, x will be “represented” 
by our having consumed x from the input, leaving whatever of w follows its 
prefix x. That is, if w = xy, then y will remain on the input. 

Suppose the PDA is in an ID (q,y, Aa), representing left-sentential form 
xAa. It guesses the production to use to expand A, say A > 8. The move of 
the PDA is to replace A on the top of the stack by 6, entering ID (q,y, Ba). 
Note that there is only one state, q, for this PDA. 

Now (q, y, 8a) may not be a representation of the next left-sentential form, 
because 3 may have a prefix of terminals. In fact, 8 may have no variables at 
all, and œa may have a prefix of terminals. Whatever terminals appear at the 
beginning of Ga need to be removed, to expose the next variable at the top of 
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the stack. These terminals are compared against the next input symbols, to 
make sure our guesses at the leftmost derivation of input string w are correct; 
if not, this branch of the PDA dies. 

If we succeed in this way to guess a leftmost derivation of w, then we shall 
eventually reach the left-sentential form w. At that point, all the symbols on 
the stack have either been expanded (if they are variables) or matched against 
the input (if they are terminals). The stack is empty, and we accept by empty 
stack. 


The above informal construction can be made precise as follows. Let G = 
(V,T,Q, S) be a CFG. Construct the PDA P that accepts L(G) by empty stack 
as follows: 


P = ({q},T,V UT,6,q,5) 


where transition function 6 is defined by: 


1. For each variable A, 
ô(q, €, A) = {(q, 8) | A > 8 is a production of G} 


2. For each terminal a, 6(q,a,a) = {(q, €)}. 


Example 6.12: Let us convert the expression grammar of Fig. 5.2 to a PDA. 
Recall this grammar is: 


a|b|Ia|Ib|I0| 11 
I|ExE|E+E| (£) 


The set of input symbols for the PDA is {a, b,0,1,(,),+,*}. These eight sym- 


bols and the symbols J and E form the stack alphabet. The transition function 
for the PDA is: 


a) ôlq, e, I) = {(¢,4), (4,6), (q, Ia), (q, Ib), (q, I0), (q, T1)}. 

b) ô(q, €, E) = {(4, I), (q,E + E), (q, E * E), (q, (E))}- 

c) ôlq,a,a) = {(q,€)}; ôlq, b,b) = {(q,€)}; ô( 
{ao}: 6460 = (ae 6(,),)) = { 


0(q, *,*) = {(¢, ©)} 


Note that (a) and (b) come from rule (1), while the eight transitions of (c) come 
from rule (2). Also, ô is empty except as defined by (a) through (c). 


Theorem 6.13: If PDA P is constructed from CFG G by the construction 
above, then N(P) = L(G). 
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PROOF: We shall prove that w is in N(P) if and only if w is in L(G). 


(If) Suppose w is in L(G). Then w has a leftmost derivation 


S yı Y2 PEA Yn w 
lm l 


m lm 


We show by induction on i that (q,w,S) č (q, Yi, &i), where y; and a; are a 
P 

representation of the left-sentential form yi. That is, let a; be the tail of yi, 

and let yi = xja;. Then y; is that string such that x;y; = w; i.e., it is what 

remains when z; is removed from the input. 


BASIS: For i = 1, yı = S. Thus, zı =e, and yı = w. Since (q, w, S) č (q, w, S) 
by 0 moves, the basis is proved. 


INDUCTION: Now we consider the case of the second and subsequent left- 
sentential forms. We assume 


(q, w, S) F (q, Yi» Qi) 


and prove (q, w, S) Ë (q,yi41, Qi41). Since a; is a tail, it begins with a variable 
A. Moreover, the step of the derivation y; = 741 involves replacing A by one of 
its production bodies, say 3. Rule (1) of the construction of P lets us replace A 
at the top of the stack by 3, and rule (2) then allows us to match any terminals 
on top of the stack with the next input symbols. As a result, we reach the ID 
(d, Yit1, Qi41), Which represents the next left-sentential form y;41. 


To complete the proof, we note that an = €, since the tail of yn (which is w) 
is empty. Thus, (q, w, S) č (q,€,€), which proves that P accepts w by empty 
stack. 


(Only-if) We need to prove something more general: that if P executes a se- 
quence of moves that has the net effect of popping a variable A from the top of 
its stack, without ever going below A on the stack, then A derives, in G, what- 
ever input string was consumed from the input during this process. Precisely: 


e If (q, 2, A) 5 (q,€,€), then A > z. 


The proof is an induction on the number of moves taken by P. 


BASIS: One move. The only possibility is that A —> € is a production of G, and 
this production is used in a rule of type (1) by the PDA P. In this case, x = e, 
and we know that A > €. 


INDUCTION: Suppose P takes n moves, where n > 1. The first move must be 
of type (1), where A is replaced by one of its production bodies on the top of 
the stack. The reason is that a rule of type (2) can only be used when there is a 
terminal on top of the stack. Suppose the production used is A + Y1 Yə --- Yk, 
where each Y; is either a terminal or variable. 
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The next n — 1 moves of P must consume x from the input and have the 
net effect of popping each of Yı, Y2, and so on from the stack, one at a time. 
We can break x into 71 22---xz, where x, is the portion of the input consumed 
until Yı is popped off the stack (i.e., the stack first is as short as k —1 symbols). 
Then zə is the next portion of the input that is consumed while popping Y> off 
the stack, and so on. 


Figure 6.9 suggests how the input x is broken up, and the corresponding 
effects on the stack. There, we suggest that 6 was BaC’, so x is divided into 
three parts 712223, where x2 = a. Note that in general, if Y; is a terminal, then 
xi must be that terminal. 


Figure 6.9: The PDA P consumes x and pops BaC from its stack 


Formally, we can conclude that (q, £i£i}1 £p, Yi) Ë (q,2i¢1 £p, €) for 
all i = 1,2,...,k. Moreover, none of these sequences can be more than n — 1 
moves, so the inductive hypothesis applies if Y; is a variable. That is, we may 
conclude Y; 5 Ti. 

If Y; is a terminal, then there must be only one move involved, and it matches 
the one symbol of x; against Y;, which are the same. Again, we can conclude 
Y; = zi; this time, zero steps are used. Now we have the derivation 


AS YY Yk 5 £1Y2 °- Yk a ass U1 XQ +++ LE 
That is, AS z. 
To complete the proof, we let A = S and x = w. Since we are given that 


w is in N(P), we know that (q, w, S) č (q,€,€). By what we have just proved 
inductively, we have S Š w; i.e., w is in L(G). 
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6.3.2 From PDA’s to Grammars 


Now, we complete the proofs of equivalence by showing that for every PDA P, 
we can find a CFG G whose language is the same language that P accepts by 
empty stack. The idea behind the proof is to recognize that the fundamental 
event in the history of a PDA’s processing of a given input is the net popping 
of one symbol off the stack, while consuming some input. A PDA may change 
state as it pops stack symbols, so we should also note the state that it enters 
when it finally pops a level off its stack. 


Po 
Y 
Py 
y 
Pk-1 
Y 
Pk 
De n a E a —=___- 


Figure 6.10: A PDA makes a sequence of moves that have the net effect of 
popping a symbol off the stack 


Figure 6.10 suggests how we pop a sequence of symbols Y1, Yo,...Y; off the 
stack. Some input x; is read while Yı is popped. We should emphasize that 
this “pop” is the net effect of (possibly) many moves. For example, the first 
move may change Yı to some other symbol Z. The next move may replace Z 
by UV, later moves have the effect of popping U, and then other moves pop V. 
The net effect is that Yı has been replaced by nothing; i.e., it has been popped, 
and all the input symbols consumed so far constitute 71. 

We also show in Fig. 6.10 the net change of state. We suppose that the PDA 
starts out in state po, with Y; at the top of the stack. After all the moves whose 
net effect is to pop Yı, the PDA is in state pı. It then proceeds to (net) pop 
Yə, while reading input string x2 and winding up, perhaps after many moves, 
in state po with Yə off the stack. The computation proceeds until each of the 
symbols on the stack is removed. 

Our construction of an equivalent grammar uses variables each of which 
represents an “event” consisting of: 


1. The net popping of some symbol X from the stack, and 


248 CHAPTER 6. PUSHDOWN AUTOMATA 


2. A change in state from some p at the beginning to q when X has finally 
been replaced by € on the stack. 


We represent such a variable by the composite symbol [pX q]. Remember that 
this sequence of characters is our way of describing one variable; it is not five 
grammar symbols. The formal construction is given by the next theorem. 


Theorem 6.14: Let P = (Q, £,T, ô, qo, Zo) bea PDA. Then there is a context- 
free grammar G such that L(G) = N(P). 

PROOF: We shall construct G = (V, £, R, S), where the set of variables V 
consists of: 


1. The special symbol S, which is the start symbol, and 


2. All symbols of the form [pXq], where p and q are states in Q, and X isa 
stack symbol, in T. 


The productions of G are as follows: 


a) For all states p, G has the production S — [qo Zop]. Recall our intuition 
that a symbol like [qo Zop] is intended to generate all those strings w that 
cause P to pop Zo from its stack while going from state go to state p. 
That is, (qo, w, Zo) F (p,€,€). If so, then these productions say that start 
symbol S will generate all strings w that cause P to empty its stack, after 
starting in its initial ID. 


b) Let (q,a, X) contain the pair (r, Yı Y2---¥;,), where: 


1. ais either a symbol in © or a= e. 


2. k can be any number, including 0, in which case the pair is (r, €). 
Then for all lists of states r1,1r2,...,r~, G has the production 
lgX rz] > alrYiril[riYore] +++ [rk-1Ykrg] 


This production says that one way to pop X and go from state q to state 
rz is to read a (which may be €), then use some input to pop Yı off the 
stack while going from state r to state r1, then read some more input that 
pops Y> off the stack and goes from state rı to r2, and so on. 


We shall now prove that the informal interpretation of the variables [q¢X p] is 
correct: 


e [gXp| = w if and only if (q,w,X) F (p, €, €). 


(If) Suppose (q, w, X) č (p,€,€). We shall show [qX p] Š w by induction on 
the number of moves made by the PDA. 
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BASIS: One step. Then (p,¢) must be in d(g,w,X), and w is either a sin- 
gle symbol or e. By the construction of G, [¢Xp] > w is a production, so 
[aXp] > w. 


INDUCTION: Suppose the sequence (q, w, X) F (p,€,€) takes n steps, and 
n > 1. The first move must look like 


(q,w,X) x (ro, z, Yi Y> i < Yp) F (p, €, €) 


where w = az for some a that is either € or a symbol in X. It follows that the 
pair (ro, Y1 Y2- Yk) must be in (q,a, X). Further, by the construction of G, 
there is a production [q¢X rz] > a[ro¥iri)[ri Y2r2] +++ [rk-1Ypgrg], where: 


1. rp = p, and 
2. 71,172,---,Tk—1 are any states in Q. 


In particular, we may observe, as was suggested in Fig. 6.10, that each of 
the symbols Y1, Y2,..., Ypk gets popped off the stack in turn, and we may choose 
ri to be the state of the PDA when Y; is popped, for i = 1,2,...,4 —1. Let 
£ = wywo:--wr,, where w; is the input consumed while Y; is popped off the 
stack. Then we know that (r;—1, wi, Yi) F (ri, €,€)- 

As none of these sequences of moves can take as many as n moves, the 
inductive hypothesis applies to them. We conclude that [ri—1Y;r:] = w;. We 
may put these derivations together with the first production used to conclude: 


[aX re] > alroYirillri Yir] [rx—1 Vere] SS 
aw, [11 Yərə][r2Y3r3] pas [rx—1Y ere] = 
awiwəflr2Yzrs]:-- [rk-1Ykrr] > 


QW1W2°*'Wp, = W 


where rz = p. 
(Only-if) The proof is an induction on the number of steps in the derivation. 


BASIS: One step. Then [qXp] > w must be a production. The only way for 
this production to exist is if there is a transition of P in which X is popped 
and state q becomes state p. That is, (p,€) must be in ô(q,a, X), and a = w. 
But then (g,w, X) F (p,€,€). 


INDUCTION: Suppose [qX p] = w by n steps, where n > 1. Consider the first 
sentential form explicitly, which must look like 


[aX rp] > alroYiri][riYore]---[re—1Yere] > w 


where rg = p. This production must come from the fact that (ro, Ya Yo --- Yp) 
is in 6(q,a, X). 
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We can break w into w = awiwə-: wp such that [rj-1¥iri] = w; for all 
4=1,2,...,k. By the inductive hypothesis, we know that for all 7, 
(ri-1 „Wi, Y;) F (ri, €, €) 


If we use Theorem 6.5 to put the correct strings beyond w; on the input and 
below Y; on the stack, we also know that 


(ria, Wiig + Wk ViVign ++ Ve) Ë (ri, wiga + we, Vin ++ Ye) 
If we put all these sequences together, we see that 


(q, aww: we, X) H (ro, wiwa ++ we, Yi Yo +- Yp) F 
(r1, wows +++ wr, Y2Y3 Yp) Ë (ro, w3-+- we, Y3 Yp) É no Ë (rk,€,€) 


Since rą = p, we have shown that (q, w, X) F (p, €, €). 


We complete the proof as follows. S => w if and only if [qo Zop] = w for some 
p, because of the way the rules for start symbol S are constructed. We just 
proved that [qo Zop| Š w if and only if (qo,w, Zo) Ë (p,€,6), i.e., if and only if 
P accepts x by empty stack. Thus, L(G) = N(P). 


Example 6.15: Let us convert the PDA Py = ({q}, {i,e}, {7}, On, ¢, Z) from 
Example 6.10 to a grammar. Recall that Py accepts all strings that violate, for 
the first time, the rule that every e (else) must correspond to some preceding 
i (if). Since Py has only one state and one stack symbol, the construction is 
particularly simple. There are only two variables in the grammar G: 


a) S, the start symbol, which is in every grammar constructed by the method 
of Theorem 6.14, and 


b) [qZq], the only triple that can be assembled from the states and stack 
symbols of Py. 


The productions of grammar G are as follows: 


1. The only production for S is S + [qZq]. However, if there were n states 
of the PDA, then there would be n productions of this type, since the last 
state could be any of the n states. The first state would have to be the 
start state, and the stack symbol would have to be the start symbol, as 
in our production above. 


2. From the fact that ôn(q,i, Z) contains (q, ZZ), we get the production 
lqZq] > ilqZq|[qZq|. Again, for this simple example, there is only one 
production. However, if there were n states, then this one rule would 
produce n? productions, since the middle two states of the body could be 
any one state p, and the last states of the head and body could also be 
any one state. That is, if p and r were any two states of the PDA, then 
production [qZp] > ilqZr][rZp] would be produced. 
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3. From the fact that ôn (q,e, Z) contains (q, €), we have production 
[qZq] > e 


Notice that in this case, the list of stack symbols by which Z is replaced 
is empty, so the only symbol in the body is the input symbol that caused 
the move. 


We may, for convenience, replace the triple [qZq] by some less complex 
symbol, say A. If we do, then the complete grammar consists of the productions: 


SoA 
A-iAA|e 


In fact, if we notice that A and S derive exactly the same strings, we may 
identify them as one, and write the complete grammar as 


G = ({S},{i,e}, {S > iSS | e}, S) 


6.3.3 Exercises for Section 6.3 


* Exercise 6.3.1: Convert the grammar 


goy OS1|A 
A > 1A0|S |e 


to a PDA that accepts the same language by empty stack. 


Exercise 6.3.2: Convert the grammar 


S > aAA 
A > aS|bS]a 


to a PDA that accepts the same language by empty stack. 


* Exercise 6.3.3: Convert the PDA P = ({p,q}, {0,1}, {X, Zo}, ô, q, Zo) toa 
CFG, if ô is given by: 


1. ô(q, 1, Zo) = {(4, X Zo) }- 


2. ôlq,1,X) = {(q, XX)}. 
3. 6(q,0,X) = {(p, X)}. 
4. ôlq €, X) = {(4,6)}- 

5. ô(p, 1, X) = {(p,€)}- 

6. ô(p, 0, Zo) = {(4, Zo) }. 


* 
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Exercise 6.3.4: Convert the PDA of Exercise 6.1.1 to a context-free grammar. 


Exercise 6.3.5: Below are some context-free languages. For each, devise a 
PDA that accepts the language by empty stack. You may, if you wish, first 
construct a grammar for the language, and then convert to a PDA. 


a) fa b™e2("+™) |n >0, m>O}. 
b) {atbic* | i = 27 or j = 2k}. 
Yc) {0717 |n<m< 2n}. 


Exercise 6.3.6: Show that if P is a PDA, then there is a one-state PDA P; 
such that N(P,) = N(P). 


Exercise 6.3.7: Suppose we have a PDA with s states, t stack symbols, and 
no rule in which a replacement stack string has length greater than u. Give a 
tight upper bound on the number of variables in the CFG that we construct 
for this PDA by the method of Section 6.3.2. 


6.4 Deterministic Pushdown Automata 


While PDA’s are by definition allowed to be nondeterministic, the determin- 
istic subcase is quite important. In particular, parsers generally behave like 
deterministic PDA’s, so the class of languages that can be accepted by these 
automata is interesting for the insights it gives us into what constructs are 
suitable for use in programming languages. In this section, we shall define 
deterministic PDA’s and investigate some of the things they can and cannot 
do. 


6.4.1 Definition of a Deterministic PDA 


Intuitively, a PDA is deterministic if there is never a choice of move in any 
situation. These choices are of two kinds. If ô(q,a, X) contains more than one 
pair, then surely the PDA is nondeterministic because we can choose among 
these pairs when deciding on the next move. However, even if ô(q,a, X) is al- 
ways a singleton, we could still have a choice between using a real input symbol, 
or making a move on e. Thus, we define a PDA P = (Q,»,T,6,q0, Zo, F) to 
be deterministic (a deterministic PDA or DPDA), if and only if the following 
conditions are met: 


1. 6(q,a,X) has at most one member for any q in Q, a in ÈE or a = e, and 
X inl. 


2. If (q,a, X) is nonempty, for some a in X, then ô(q, €, X) must be empty. 
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Example 6.16: It turns out that the language Lwwr of Example 6.2 is a CFL 
that has no DPDA. However, by putting a “center-marker” cin the middle, we 
can make the language recognizable by a DPDA. That is, we can recognize the 
language Lwcwr = {wew® | w is in (0 + 1)*} by a deterministic PDA. 

The strategy of the DPDA is to store 0’s and 1’s on its stack, until it sees 
the center marker c. it then goes to another state, in which it matches input 
symbols against stack symbols and pops the stack if they match. If it ever finds 
a nonmatch, it dies; its input cannot be of the form wcw. If it succeeds in 
popping its stack down to the initial symbol, which marks the bottom of the 
stack, then it accepts its input. 

The idea is very much like the PDA that we saw in Fig. 6.2. However, that 
PDA is nondeterministic, because in state go it always has the choice of pushing 
the next input symbol onto the stack or making a transition on € to state q1; 
i.e., it has to guess when it has reached the middle. The DPDA for Lwcwr is 
shown as a transition diagram in Fig. 6.11. 

This PDA is clearly deterministic. It never has a choice of move in the same 
state, using the same input and stack symbol. As for choices between using a 
real input symbol or e, the only ¢-transition it makes is from qı to q2 with Zo 
at the top of the stack. However, in state qı, there are no other moves when 
Zo is at the stack top. 


Figure 6.11: A deterministic PDA accepting Lwewr 


6.4.2 Regular Languages and Deterministic PDA’s 


The DPDA’s accept a class of languages that is between the regular languages 
and the CFL’s. We shall first prove that the DPDA languages include all the 
regular languages. 


Theorem 6.17: If L is a regular language, then L = L(P) for some DPDA P. 


PROOF: Essentially, a DPDA can simulate a deterministic finite automaton. 
The PDA keeps some stack symbol Zo on its stack, because a PDA has to have 
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a stack, but really the PDA ignores its stack and just uses its state. Formally, 
let A = (Q,¥,64,q0, F) be a DFA. Construct DPDA 


P= (Q,%,{Z0},0P, 40, Zo, F) 


by defining dp(q,a,Z0) = {(p,Zo)} for all states p and q in Q, such that 
ôa (q,a) = p. . 

We claim that (qo, w, Zo) F (p, €, Zo) if and only if ôA (qo, w) = p. That is, 
P simulates A using its state. The proofs in both directions are easy inductions 
on |w|, and we leave them for the reader to complete. Since both A and P 
accept by entering one of the states of F, we conclude that their languages are 
the same. 


If we want the DPDA to accept by empty stack, then we find that our 
language-recognizing capability is rather limited. Say that a language L has 
the prefix property if there are no two different strings x and y in L such that 
x is a prefix of y. 


Example 6.18: The language Lwcwr of Example 6.16 has the prefix property. 
That is, it is not possible for there to be two strings wcw? and «cx™, one of 
which is a prefix of the other, unless they are the same string. To see why, 
suppose wcw? is a prefix of cx”, and w #4 x. Then w must be shorter than 
x. Therefore, the c in wcw? comes in a position where xcx® has a 0 or 1; it is 
a position in the first x. That point contradicts the assumption that wcw? is 
a prefix of aca®. 

On the other hand, there are some very simple languages that do not have 
the prefix property. Consider {0}*, i.e., the set of all strings of 0’s. Clearly, 
there are pairs of strings in this language one of which is a prefix of the other, 
so this language does not have the prefix property. In fact, of any two strings, 
one is a prefix of the other, although that condition is stronger than we need 
to establish that the prefix property does not hold. 


Note that the language {0}* is a regular language. Thus, it is not even true 
that every regular language is N(P) for some DPDA P. We leave as an exercise 
the following relationship: 


Theorem 6.19: A language L is N(P) for some DPDA P if and only if L has 
the prefix property and L is L(P') for some DPDA P”. 


6.4.3 DPDA’s and Context-Free Languages 


We have already seen that a DPDA can accept languages like Lwcwr that are 
not regular. To see this language is not regular, suppose it were, and use the 
pumping lemma. If n is the constant of the pumping lemma, then consider the 
string w = 0”c0”, which is in Lwcwr. However, when we “pump” this string, it 
is the first group of 0’s whose length must change, so we get in Lwcwr strings 
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that have the “center” marker not in the center. Since these strings are not in 
Lwcwr, we have a contradiction and conclude that Lwewr is not regular. 

On the other hand, there are CFL’s like Lwwr that cannot be L(P) for any 
DPDA P. A formal proof is complex, but the intuition is transparent. If P is 
a DPDA accepting Lwwr, then given a sequence of 0’s, it must store them on 
the stack, or do something equivalent to count an arbitrary number of 0’s. For 
instance, it could store one X for every two 0’s it sees, and use the state to 
remember whether the number was even or odd. 

Suppose P has seen n 0’s and then sees 110”. It must verify that there 
were n 0’s after the 11, and to do so it must pop its stack. Now, P has seen 
0"110”. If it sees an identical string next, it must accept, because the complete 
input is of the form ww", with w = 0"110". However, if it sees 0110” for 
some m Æ n, P must not accept. Since its stack is empty, it cannot remember 
what arbitrary integer n was, and must fail to recognize Lwwr correctly. Our 
conclusion is that: 


e The languages accepted by DPDA’s by final state properly include the 
regular languages, but are properly included in the CFL’s. 


6.4.4 DPDA’s and Ambiguous Grammars 


We can refine the power of the DPDA’s by noting that the languages they accept 
all have unambiguous grammars. Unfortunately, the DPDA languages are not 
exactly equal to the subset of the CFL’s that are not inherently ambiguous. 
For instance, Lwwr has an unambiguous grammar 


S + 050 | 1S1 | € 


even though it is not a DPDA language. The following theorems refine the 
bullet point above. 


Theorem 6.20: If L = N(P) for some DPDA P, then L has an unambiguous 
context-free grammar. 


PROOF: We claim that the construction of Theorem 6.14 yields an unambiguous 
CFG G when the PDA to which it is applied is deterministic. First recall from 
Theorem 5.29 that it is sufficient to show that the grammar has unique leftmost 
derivations in order to prove that G is unambiguous. 

Suppose P accepts string w by empty stack. Then it does so by a unique 
sequence of moves, because it is deterministic, and cannot move once its stack 
is empty. Knowing this sequence of moves, we can determine the one choice of 
production in a leftmost derivation whereby G derives w. There can never be a 
choice of which rule of P motivated the production to use. However, a rule of 
P, say 6(q,a,X) = {(r, Y1 Y2 --- Yp) } might cause many productions of G, with 


>This statement is the intuitive part that requires a (hard) formal proof; could there be 
some other way for P to compare equal blocks of 0’s? 
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different states in the positions that reflect the states of P after popping each 
of Yi, Y2,...,Yz-1. Because P is deterministic, only one of these sequences of 
choices will be consistent with what P actually does, and therefore, only one of 
these productions will actually lead to derivation of w. 


However, we can prove more: even those languages that DPDA’s accept by 
final state have unambiguous grammars. Since we only know how to construct 
grammars directly from PDA’s that accept by empty stack, we need to change 
the language involved to have the prefix property, and then modify the resulting 
grammar to generate the original language. We do so by use of an “endmarker” 
symbol. 


Theorem 6.21: If L = L(P) for some DPDA P, then L has an unambiguous 
CFG. 


PROOF: let $ be an “endmarker” symbol that does not appear in the strings of 
L, and let L’ = L$. That is, the strings of L’ are the strings of L, each followed 
by the symbol $. Then L’ surely has the prefix property, and by Theorem 6.19, 
L' = N(P') for some DPDA P’. By Theorem 6.20, there is an unambiguous 
grammar G” generating the language N(P’), which is L’. 

Now, construct from G” a grammar G such that L(G) = L. To do so, we 
have only to get rid of the endmarker $ from strings. Thus, treat $ as a variable 
of G, and introduce production $ — e€; otherwise, the productions of G’ and G 
are the same. Since L(G’) = L’, it follows that L(G) = L. 

We claim that G is unambiguous. In proof, the leftmost derivations in G are 
exactly the same as the leftmost derivations in G’, except that the derivations 
in G have a final step in which $ is replaced by e. Thus, if a terminal w string 
had two leftmost derivations in G, then w$ would have two leftmost derivations 
in G’. Since we know G” is unambiguous, so is G. 


6.4.5 Exercises for Section 6.4 


Exercise 6.4.1: For each of the following PDA’s, tell whether or not it is 
deterministic. Either show that it meets the definition of a DPDA or find a 
rule or rules that violate it. 


a) The PDA of Example 6.2. 
* b) The PDA of Exercise 6.1.1. 
c) The PDA of Exercise 6.3.3. 


Exercise 6.4.2: Give deterministic pushdown automata to accept the follow- 
ing languages: 


6The proof of Theorem 6.19 appears in Exercise 6.4.3, but we can easily see how to 
construct P’ from P. Add a new state q that P’ enters whenever P is in an accepting state 
and the next input is $. In state q, P’ pops all symbols off its stack. Also, P’ needs its own 
bottom-of-stack marker to avoid accidentally emptying its stack as it simulates P. 
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a) {0"1” |n <m}. 
b) {0"1™" | n > m}. 
c) {071™0” | n and m are arbitrary}. 
Exercise 6.4.3: We can prove Theorem 6.19 in three parts: 
* a) Show that if L = N (P) for some DPDA P, then L has the prefix property. 


! b) Show that if L = N (P) for some DPDA P, then there exists a DPDA P' 
such that L = L(P’). 


*! c) Show that if L has the prefix property and is L(P') for some DPDA P’, 
then there exists a DPDA P such that L = N(P). 


! Exercise 6.4.4: Show that the language 


L= {0”1” |n > 1}U {0 |S TY 


is a context-free language that is not accepted by any DPDA. Hint: Show that 
there must be two strings of the form 0”1” for different values of n, say nı and 
nə that cause a hypothetical DPDA for L to enter the same ID after reading 
both strings. Intuitively, the DPDA must erase from its stack almost everything 
it placed there on reading the 0’s, in order to check that it has seen the same 
number of 1’s. Thus, the DPDA cannot tell whether or not to accept next after 
seeing nı 1’s or after seeing nz 1’s. 


6.5 Summary of Chapter 6 


+ Pushdown Automata: A PDA is a nondeterministic finite automaton cou- 
pled with a stack that can be used to store a string of arbitrary length. 
The stack can be read and modified only at its top. 


+ Moves of a Pushdown Automata: A PDA chooses its next move based 
on its current state, the next input symbol, and the symbol at the top 
of its stack. It may also choose to make a move independent of the 
input symbol and without consuming that symbol from the input. Being 
nondeterministic, the PDA may have some finite number of choices of 
move; each is a new state and a string of stack symbols with which to 
replace the symbol currently on top of the stack. 


Acceptance by Pushdown Automata: There are two ways in which we 
may allow the PDA to signal acceptance. One is by entering an accepting 
state; the other by emptying its stack. These methods are equivalent, in 
the sense that any language accepted by one method is accepted (by some 
other PDA) by the other method. 
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Instantaneous Descriptions: We use an ID consisting of the state, re- 
maining input, and stack contents to describe the “current condition” of 
a PDA. A transition function F between ID’s represents single moves of 
a PDA. 


Pushdown Automata and Grammars: The languages accepted by PDA’s 
either by final state or by empty stack, are exactly the context-free lan- 
guages. 


Deterministic Pushdown Automata: A PDA is deterministic if it never 
has a choice of move for a given state, input symbol (including €), and 
stack symbol. Also, it never has a choice between making a move using a 
true input and a move using € input. 


Acceptance by Deterministic Pushdown Automata: The two modes of ac- 
ceptance — final state and empty stack — are not the same for DPDA’s. 
Rather, the languages accepted by empty stack are exactly those of the 
languages accepted by final state that have the prefix property: no string 
in the language is a prefix of another word in the language. 


The Languages Accepted by DPDA’s: All the regular languages are ac- 
cepted (by final state) by DPDA’s, and there are nonregular languages 
accepted by DPDA’s. The DPDA languages are context-free languages, 
and in fact are languages that have unambiguous CFG’s. Thus, the DPDA 
languages lie strictly between the regular languages and the context-free 
languages. 


6.6 Gradiance Problems for Chapter 6 


The following is a sample of problems that are available on-line through the 
Gradiance system at www.gradiance.com/pearson. Each of these problems 
is worked like conventional homework. The Gradiance system gives you four 
choices that sample your knowledge of the solution. If you make the wrong 
choice, you are given a hint or advice and encouraged to try the same problem 
again. 


Problem 6.1: Consider the pushdown automaton with the following transi- 


tion rules: 
1. 6(q,0, Zo) = {(4, X Zo) } 
2. 6(9,0,X) = {(¢,XX)} 
3. 6(q,1,X) = {(¢, X)} 
4. 5(q,€,X) = {(p, €)} 
5. ô(p,€, X) = {(p,€)} 
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6. ô(p,1, X) = {(p,XX)} 
7. O(p, 1, Zo) = {(p, e)} 


The start state is q. For which of the following inputs can the PDA first enter 
state p with the input empty and the stack containing XX Zo |i.e., the ID 
(p, €, XX Zo)]? 


Problem 6.2: For the same PDA as Problem 6.1: from the ID (p, 1101, XX Zo), 
which of the following ID’s can not be reached? 


Problem 6.3: In Fig. 6.12 are the transitions of a deterministic pushdown 
automaton. The start state is qo, and f is the accepting state. 

Describe informally what this PDA does. Then, identify below, the one 
input string that takes the PDA into state q3 (with any stack). 


State-Symbol a b € 
qo — Zo (m,4AZo)  (q2, BZo) (f,€) 
q = A (q, AAA) (a, €) ig 
qı — Zo = = (qo, Zo) 
q — B (q3,€) (q2, BB) = 
q2 — Zo = 2 (qo, Zo) 
q3 — B = = (q2,€) 
q3 — Zo = = (q1, AZo) 


Figure 6.12: A PDA 


Problem 6.4: For the PDA in Fig. 6.12, describe informally what this PDA 
does. Then, identify below the one input string that the PDA accepts. 


Problem 6.5: If we convert the context-free grammar G: 


SAS | A 
A->0A|1BI1 
B>0B]|0 


to a pushdown automaton that accepts L(G) by empty stack, using the con- 
struction of Section 6.3.1, which of the following would be a rule of the PDA? 


Problem 6.6: Suppose one transition rule of some PDA P is ô(q,0,X) = 
{(p, Y Z), (r, XY)}. If we convert PDA P to an equivalent context-free grammar 
G in the manner described in Section 6.3.2, which of the following could be a 
production of G derived from this transition rule? You may assume s and t are 
states of P, as well as p, q, and r. 
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6.7 References for Chapter 6 


The idea of the pushdown automaton is attributed independently to Oettinger 
[4] and Schutzenberger [5]. The equivalence between pushdown automata and 
context-free languages was also the result of independent discoveries; it appears 
in a 1961 MIT technical report by N. Chomsky but was first published by Evey 
[1]. 

The deterministic PDA was first introduced by Fischer [2] and Schutzen- 
berger [5]. It gained significance later as a model for parsers. Notably, [3] 
introduces the “LR(k) grammars,” a subclass of CFG’s that generates exactly 
the DPDA languages. The LR(k) grammars, in turn, form the basis for YACC, 
the parser-generating tool discussed in Section 5.3.2. 
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Chapter 7 


Properties of Context-Free 
Languages 


We shall complete our study of context-free languages by learning some of 
their properties. Our first task is to simplify context-free grammars; these 
simplifications make it easier to prove facts about CFL’s, since we can claim 
that if a language is a CFL, then it has a grammar in some special form. 

We then prove a “pumping lemma” for CFL’s. This theorem is in the 
same spirit as Theorem 4.1 for regular languages, but can be used to prove 
a language not to be context-free. Next, we consider the sorts of properties 
that we studied in Chapter 4 for the regular languages: closure properties and 
decision properties. We shall see that some, but not all, of the closure properties 
that the regular languages have are also possessed by the CFL’s. Likewise, some 
questions about CFL’s can be decided by algorithms that generalize the tests 
we developed for regular languages, but there are also certain questions about 
CFL’s that we cannot answer. 


7.1 Normal Forms for Context-Free Grammars 


The goal of this section is to show that every CFL (without €) is generated by a 
CFG in which all productions are of the form A — BC or A —> a, where A, B, 
and ČC are variables, and a is a terminal. This form is called Chomsky Normal 
Form. To get there, we need to make a number of preliminary simplifications, 
which are themselves useful in various ways: 


1. We must eliminate useless symbols, those variables or terminals that do 
not appear in any derivation of a terminal string from the start symbol. 


2. We must eliminate ¢-productions, those of the form A — e for some vari- 
able A. 
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3. We must eliminate unit productions, those of the form A —> B for variables 
A and B. 


7.1.1 Eliminating Useless Symbols 


We say a symbol X is useful for a grammar G = (V,T, P, S) if there is some 
derivation of the form S => aX B 5 w, where w is in T*. Note that X may be 
in either V or T, and the sentential form aX 8 might be the first or last in the 
derivation. If X is not useful, we say it is useless. Evidently, omitting useless 
symbols from a grammar will not change the language generated, so we may as 
well detect and eliminate all useless symbols. 

Our approach to eliminating useless symbols begins by identifying the two 
things a symbol has to be able to do to be useful: 


1. We say X is generating if X = w for some terminal string w. Note that 
every terminal is generating, since w can be that terminal itself, which is 
derived by zero steps. 


2. We say X is reachable if there is a derivation S * aX for some a and 


B. 


Surely a symbol that is useful will be both generating and reachable. If we 
eliminate the symbols that are not generating first, and then eliminate from 
the remaining grammar those symbols that are not reachable, we shall, as will 
be proved, have only the useful symbols left. 


Example 7.1: Consider the grammar: 


S> AB|a 
A—>b 


All symbols but B are generating; a and b generate themselves; S generates 
a, and A generates b. If we eliminate B, we must eliminate the production 
S — AB, leaving the grammar: 


Sa 
A+>b 


Now, we find that only S and a are reachable from S. Eliminating A and 
b leaves only the production S + a. That production by itself is a grammar 
whose language is {a}, just as is the language of the original grammar. 

Note that if we start by checking for reachability first, we find that all 
symbols of the grammar 


S—> AB|a 
A—>b 
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are reachable. If we then eliminate the symbol B because it is not generating, 
we are left with a grammar that still has useless symbols, in particular, A and 
b. 


Theorem 7.2: Let G = (V,T,P,S) be a CFG, and assume that L(G) £ 9; 
i.e., G generates at least one string. Let Gi = (Vi, Tı, P1, S) be the grammar 
we obtain by the following steps: 


1. First eliminate nongenerating symbols and all productions involving one 
or more of those symbols. Let Gə = (V2, T2, P2, S) be this new grammar. 
Note that S must be generating, since we assume L(G) has at least one 
string, so S has not been eliminated. 


2. Second, eliminate all symbols that are not reachable in the grammar G2. 


Then G has no useless symbols, and L(G,) = L(G). 


PROOF: Suppose X is a symbol that remains; i.e., X is in Vi U Tı. We know 
that X 7 w for some w in T*. Moreover, every symbol used in the derivation 


of w from X is also generating. Thus, X = w. 


2 
Since X was not eliminated in the second step, we also know that there are 
* 
a and 8 such that S A aX 3. Further, every symbol used in this derivation is 


G2 


reachable, so S > aX 3. 


We know that every symbol in aX is reachable, and we also know that 
all these symbols are in V2 U Tə, so each of them is generating in G2. The 
derivation of some terminal string, say aX 8 > xwy, involves only symbols 


that are reachable from S, because they are tached by symbols in aX 3. Thus, 
this derivation is also a derivation of G1; that is, 


sS aXgB + rcwy 
en en 


We conclude that X is useful in G4. Since X is an arbitrary symbol of G1, we 
conclude that G; has no useless symbols. 

The last detail is that we must show L(G,) = L(G). As usual, to show two 
sets the same, we show each is contained in the other. 


E(Gi) C L(G): Since we have only eliminated symbols and productions from 
G to get G1, it follows that L(G,) C L(G). 


L(G) C L(G,): We must prove that if w is in L(G), then w is in L(G). If 
w is in L(G), then S > w. Each symbol in this derivation is evidently both 


reachable and generating, so it is also a derivation of G;. That is, S 5 w, and 
Gi 
thus w is in L(G). 
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7.1.2 Computing the Generating and Reachable Symbols 


Two points remain. How do we compute the set of generating symbols of a 
grammar, and how do we compute the set of reachable symbols of a grammar? 
For both problems, the algorithm we use tries its best to discover symbols of 
these types. We shall show that if the proper inductive constructions of these 
sets fails to discover a symbol to be generating or reachable, respectively, then 
the symbol is not of these types. 

Let G = (V,T, P, S) be a grammar. To compute the generating symbols of 
G, we perform the following induction. 


BASIS: Every symbol of T is obviously generating; it generates itself. 


INDUCTION: Suppose there is a production A —> a, and every symbol of a 
is already known to be generating. Then A is generating. Note that this rule 
includes the case where a = e; all variables that have € as a production body 
are surely generating. 


Example 7.3: Consider the grammar of Example 7.1. By the basis, a and 6 
are generating. For the induction, we can use the production A —> b to conclude 
that A is generating, and we can use the production S — a to conclude that 
S is generating. At that point, the induction is finished. We cannot use the 
production S — AB, because B has not been established to be generating. 
Thus, the set of generating symbols is {a, b, A, S}. 


Theorem 7.4: The algorithm above finds all and only the generating symbols 
of G. 


PROOF: For one direction, it is an easy induction on the order in which symbols 
are added to the set of generating symbols that each symbol added really is 
generating. We leave to the reader this part of the proof. 

For the other direction, suppose X is a generating symbol, say X Š w. 


G 
We prove by induction on the length of this derivation that X is found to be 
generating. 


BASIS: Zero steps. Then X is a terminal, and X is found in the basis. 


INDUCTION: If the derivation takes n steps for n > 0, then X is a variable. 
Let the derivation be X > a > w; that is, the first production used is X > a. 
Each symbol of @ derives some terminal string that is a part of w, and that 
derivation must take fewer than n steps. By the inductive hypothesis, each 
symbol of a is found to be generating. The inductive part of the algorithm 
allows us to use production X > a to infer that X is generating. 


Now, let us consider the inductive algorithm whereby we find the set of 
reachable symbols for the grammar G = (V,T, P, S). Again, we can show that 
by trying our best to discover reachable symbols, any symbol we do not add to 
the reachable set is really not reachable. 
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BASIS: S is surely reachable. 


INDUCTION: Suppose we have discovered that some variable A is reachable. 
Then for all productions with A in the head, all the symbols of the bodies of 
those productions are also reachable. 


Example 7.5: Again start with the grammar of Example 7.1. By the basis, 
S is reachable. Since S has production bodies AB and a, we conclude that 
A, B, and a are reachable. B has no productions, but A has A > b. We 
therefore conclude that b is reachable. Now, no more symbols can be added to 
the reachable set, which is {S, A, B, a,b}. 


Theorem 7.6: The algorithm above finds all and only the reachable symbols 
of G. 


PROOF: This proof is another pair of simple inductions akin to Theorem 7.4. 
We leave these arguments as an exercise. 


7.1.3 Eliminating «-Productions 


Now, we shall show that «productions, while a convenience in many grammar- 
design problems, are not essential. Of course without a production that has 
an € body, it is impossible to generate the empty string as a member of the 
language. Thus, what we actually prove is that if language L has a CFG, then 
L — {e} has a CFG without e-productions. If e is not in L, then L itself is 
L — {e}, so L has a CFG without «productions. 

Our strategy is to begin by discovering which variables are “nullable.” A 
variable A is nullable if A Š e. If A is nullable, then whenever A appears in 
a production body, say B + CAD, A might (or might not) derive e. We make 
two versions of the production, one without A in the body (B > CD), which 
corresponds to the case where A would have been used to derive e, and the 
other with A still present (B — CAD). However, if we use the version with A 
present, then we cannot allow A to derive e. That proves not to be a problem, 
since we shall simply eliminate all productions with e bodies, thus preventing 
any variable from deriving e€. 

Let G = (V,T, P, S) be a CFG. We can find all the nullable symbols of G by 
the following iterative algorithm. We shall then show that there are no nullable 
symbols except what the algorithm finds. 


BASIS: If A > € is a production of G, then A is nullable. 


INDUCTION: If there is a production B > C,C2--:Cz, where each Ci is 
nullable, then B is nullable. Note that each C; must be a variable to be nullable, 
so we only have to consider productions with all-variable bodies. 


Theorem 7.7: In any grammar G, the only nullable symbols are the variables 
found by the algorithm above. 


266 CHAPTER 7. PROPERTIES OF CONTEXT-FREE LANGUAGES 


PROOF: For the “if” direction of the implied “A is nullable if and only if the 
algorithm identifies A as nullable,” we simply observe that, by an easy induction 
on the order in which nullable symbols are discovered, that each such symbol 
truly derives e. For the “only-if” part, we can perform an induction on the 
length of the shortest derivation A = e. 


BASIS: One step. Then A — e must be a production, and A is discovered in 
the basis part of the algorithm. 


INDUCTION: Suppose A Š e by n steps, where n > 1. The first step must 
look like A > C1C2--- Ck + c€, where each C; derives e by a sequence of 
fewer than n steps. By the inductive hypothesis, each C; is discovered by 
the algorithm to be nullable. Thus, by the inductive step, A, thanks to the 
production A > CC 2---Cx, is found to be nullable. 


Now we give the construction of a grammar without e-productions. Let 
G = (V,T,P,S) be a CFG. Determine all the nullable symbols of G. We 
construct a new grammar Gi = (V,T,P,,S), whose set of productions P, is 
determined as follows. 

For each production A > XıXə2--- Xx of P, where k > 1, suppose that m 
of the k X;’s are nullable symbols. The new grammar G will have 2™ versions 
of this production, where the nullable X;’s, in all possible combinations are 
present or absent. There is one exception: if m = k, i.e., all symbols are 
nullable, then we do not include the case where all X;’s are absent. Also, note 
that if a production of the form A —> e€ is in P, we do not place this production 
in P. 


Example 7.8: Consider the grammar 


S > AB 
A->aAA |e 
B —=>bBB |e 


First, let us find the nullable symbols. A and B are directly nullable because 
they have productions with e as the body. Then, we find that S is nullable, 
because the production S > AB has a body consisting of nullable symbols only. 
Thus, all three variables are nullable. 

Now, let us construct the productions of grammar G,. First consider 
S — AB. All symbols of the body are nullable, so there are four ways we 
could choose present or absent for A and B, independently. However, we are 
not allowed to choose to make all symbols absent, so there are only three pro- 
ductions: 

S>AB|A|B 


Next, consider production A + aAA. The second and third positions hold 
nullable symbols, so again there are four choices of present/absent. In this case, 
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all four choices are allowable, since the nonnullable symbol a will be present in 
any case. Our four choices yield productions: 


A>aAA|aA|aA|a 


Note that the two middle choices happen to yield the same production, since it 

doesn’t matter which of the A’s we eliminate if we decide to eliminate one of 

them. Thus, the final grammar Gi will only have three productions for A. 
Similarly, the production B yields for G1: 


B —bBB |bB |b 
The two e-productions of G yield nothing for G1. Thus, the following produc- 
tions: 


S>AB|A|B 
A>aAA|aA|a 
B— bBB|bB|b 


constitute G4. 


We conclude our study of the elimination of e-productions by proving that 
the construction given above does not change the language, except that € is no 
longer present if it was in the language of G. Since the construction obviously 
eliminates ¢-productions, we shall have a complete proof of the claim that. for 
every CFG G, there is a grammar G, with no e-productions, such that 


L(G) = L(G) — {e} 


Theorem 7.9: If the grammar G, is constructed from G by the above con- 
struction for eliminating e-productions, then L(G1) = L(G) — {e}. 

PROOF: We must show that if w # e, then w is in L(G) if and only if w 
is in L(G). As is often the case, we find it easier to prove a more general 
statement. In this case, we need to talk about the terminal strings that each 
variable generates, even though we only care what the start symbol S generates. 
Thus, we shall prove: 


0 AS w if and only if A > wand we. 


1 
In each case, the proof is an induction on the length of the derivation. 
(Only-if) Suppose that A > w. Then surely w Æ e, because Gi has no e- 
productions. We must show ‘by induction on the length of the derivation that 
A > w. 


BASIS: One step. Then there is a production A > w in G,. The construction 

of G4 tells us that there is some production A —> a of G, such that a is w, with 
$ : . * 

zero or more nullable variables interspersed. Then in G, A > a > w, where 


the steps after the first, if any, derive € from whatever variables there are in a. 
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INDUCTION: Suppose the derivation takes n > 1 steps. Then the derivation 
* is 
looks like A => XıXə---Xk > w. The first production used must come from 


Gi Gi 
a production A > Y1 Y>» --- Ym, where the Y’s are the X’s, in order, with zero 
or more additional, nullable variables interspersed. Also, we can break w into 
wiw2'':wk, Where X; > w; for i = 1,2,...,k. If X; is a terminal, then 
Gi 


wi = X;, and if X; is a variable, then the derivation X; = w; takes fewer than 
T 


n steps. By the inductive hypothesis, we can conclude X; 5 Wj. 


G 
Now, we construct a corresponding derivation in G as follows: 
* x 
A> YY- Ym > XiX Xk > ww wg =w 
G G G 


The first step is application of the production A > Yı Yə- Ym that we know 
exists in G. The next group of steps represents the derivation of e from each 
of the Y;’s that is not one of the X;’s. The final group of steps represents the 
derivations of the w;’s from the X;’s, which we know exist by the inductive 
hypothesis. 


(If) Suppose A => w and w # e. We show by induction on the length n of the 
G 


derivation, that A = w. 
al 


BASIS: One step. Then A ~— w is a production of G. Since w Æ e, this 


production is also a production of G1, and A > w. 
1 


INDUCTION: Suppose the derivation takes n > 1 steps. Then the derivation 
* 

looks like A => YY ---Ym => w. We can break w = w,wWo:++Wm, such that 
G G 

Y; => w; fori = 1,2,...,m. Let X1, X2,..., Xp be those of the Y;’s, in order, 


G 
such that wj # e. We must have k > 1, since w # e. Thus, A > Xi X2- Xk 
is a production of G1. 
* 2 
We claim that X1 X2- Xk > w, since the only Y;’s that are not present 


among the X’s were used to derive €, and thus do not contribute to the deriva- 
: : : : * 
tion of w. Since each of the derivations Y; z wj takes fewer than n steps, we 


may apply the inductive hypothesis and conclude that, if w; 4 €, then Y; > Wj. 
1 
Thus, A> XX2 Xp > w. 
Gi Gi 
Now, we complete the proof as follows. We know w is in L(G) if and only if 
S > w. Letting A = S in the above, we know that w is in L(G) if and only 
1 


if S Š wand we. That is, w is in L(G) if and only if w is in L(G) and 
G 
we. 


7.1.4 Eliminating Unit Productions 


A unit production is a production of the form A —> B, where both A and B are 
variables. These productions can be useful. For instance, in Example 5.27, we 
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saw how using unit productions E + T and T — F allowed us to create an 
unambiguous grammar for simple arithmetic expressions: 


I > a|b|la|Ib| I0| 11 
F > I|(£) 

T > AR Pak 

E > T|E+T 


However, unit productions can complicate certain proofs, and they also in- 
troduce extra steps into derivations that technically need not be there. For 
instance, we could expand the T in production E — T in both possible ways, 
replacing it by the two productions E > F | T x F. That change still doesn’t 
eliminate unit productions, because we have introduced unit production E > F 
that was not previously part of the grammar. Further expanding E —> F by 
the two productions for F gives us E + I | (E) | T * F. We still have a unit 
production; it is Æ — I. But if we further expand this J in all six possible ways, 
we get: 

E->a|b|fa|fb| 10|71|(F)|TxF| BE+T 


Now the unit production for E is gone. Note that E —> a is not a unit 
production, since the lone symbol in the body is a terminal, rather than a 
variable as is required for unit productions. 

The technique suggested above — expand unit productions until they disap- 
pear — often works. However, it can fail if there is a cycle of unit productions, 
such as A> B, B+ C, and C — A. The technique that is guaranteed to work 
involves first finding all those pairs of variables A and B such that A> B us- 
ing a sequence of unit productions only. Note that it is possible for A = Bto 
be true even though no unit productions are involved. For instance, we might 
have productions A > BC and C > €. 

Once we have determined all such pairs, we can replace any sequence of 
derivation steps in which A By Bo sae Bn a by a production 
that uses the nonunit production B, — a directly from A; that is, A> a. To 
begin, here is the inductive construction of the pairs (A, B) such that A> B 
using only unit productions. Call such a pair a unit pair. 


BASIS: (A, A) is a unit pair for any variable A. That is, A Š A by zero steps. 


INDUCTION: Suppose we have determined that (A,B) is a unit pair, and 
B = C is a production, where C is a variable. Then (A, C) is a unit pair. 


Example 7.10: Consider the expression grammar of Example 5.27, which we 
reproduced above. The basis gives us the unit pairs (E, E), (T, T), (F, F), and 
(I, I). For the inductive step, we can make the following inferences: 


1. (E, E) and the production E — T gives us unit pair (E, T). 
(E,T) and the production T > F gives us unit pair (E, F). 


2: 
3. (E, F) and the production F — I gives us unit pair (E, I). 
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4. (T,T) and the production T — F gives us unit pair (T, F). 
5. (T, F) and the production F - I gives us unit pair (T, J). 
6. (F, F) and the production F — I gives us unit pair (F, J). 


There are no more pairs that can be inferred, and in fact these ten pairs repre- 
sent all the derivations that use nothing but unit productions. 


The pattern of development should by now be familiar. There is an easy 
proof that our proposed algorithm does get all the pairs we want. We then use 
the knowledge of those pairs to remove unit productions from a grammar and 
show that the language of the two grammars is the same. 


Theorem 7.11: The algorithm above finds exactly the unit pairs for a CFG 
G. 


PROOF: In one direction, it is an easy induction on the order in which the pairs 
are discovered, that if (A, B) is found to be a unit pair, then A Š B using 
G 


only unit productions. We leave this part of the proof to you. 
In the other direction, suppose that A > B using unit productions only. 


We can show by induction on the length of the derivation that the pair (A, B) 
will be found. 


BASIS: Zero steps. Then A = B, and the pair (A, B) is added in the basis. 


INDUCTION: Suppose A > B using n steps, for some n > 0, each step being 
the application of a unit production. Then the derivation looks like 


AŠ C>B 


The derivation A > C takes n — 1 steps, so by the inductive hypothesis, we 
discover the pair (A,C). Then the inductive part of the algorithm combines 
the pair (A, C) with the production C — B to infer the pair (A, B). 


To eliminate unit productions, we proceed as follows. Given a CFG G = 
(V,T,P,S), construct CFG Gi = (V,T, P., S): 


1. Find all the unit pairs of G. 


2. For each unit pair (A, B), add to P, all the productions A > a, where 
B — a is a nonunit production in P. Note that A = B is possible; in 
that way, P) contains all the nonunit productions in P. 


Example 7.12: Let us continue with Example 7.10, which performed step (1) 
of the construction above for the expression grammar of Example 5.27. Fig- 
ure 7.1 summarizes step (2) of the algorithm, where we create the new set of 
productions by using the first member of a pair as the head and all the nonunit 
bodies for the second member of the pair as the production bodies. 

The final step is to eliminate the unit productions from the grammar of 
Fig. 7.1. The resulting grammar: 
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Pair Productions 


(F,E) | E>E+T 

(E,T) | BOT*F 

(E,F) | E > (E) 

(E,I) | E >a |b| Za | Tb | 10| 11 
(T.T) | PST ek 

(T,F) | T > (E) 

(T,I) |T —>a]|b]|ZIa]|Tb]|I0| T1 
(F,F) | F > (E) 

(F,I) | F>a]|b]|Ia]|Tb|T0|T1 
(1,1) | I—a]|b]Za]|Tb]|I0]|T1 


Figure 7.1: Grammar constructed by step (2) of the unit-production-elimination 
algorithm 


E>SE+T|T+*F|(E)|a|6|Ia|Ib| 10| 1 
T3T«F|(E)|a|6|Ia| 1b| 10] 11 

F > (E) |a|b]| Za] Tb] I0|T1 

I —>a |b ]|Ta | Tb]|T0]|T1 


has no unit productions, yet generates the same set of expressions as the gram- 
mar of Fig. 5.19. 


Theorem 7.13: If grammar G is constructed from grammar G by the algo- 
rithm described above for eliminating unit productions, then L(G,) = L(G). 


PROOF: We show that w is in L(G) if and only if w is in L(G). 


(If) Suppose S > w. Since every production of G is equivalent to a sequence 


of zero or more anil productions of G followed by a nonunit production of G, 
we know that a => ( implies a $ GB. That is, every step of a derivation in G1 


can be replaced by one or more derivation steps in G. If we put these sequences 
of steps together, we conclude that S > w. 


(Only-if) Suppose now that w is in L(G). Then by the equivalences in Sec- 
tion 5.2, we know that w has a leftmost derivation, i.e., S => w. Whenever a 


unit production is used in a leftmost derivation, the ak of the body be- 
comes the leftmost variable, and so is immediately replaced. Thus, the leftmost 
derivation in grammar G can be broken into a sequence of steps in which zero 
or more unit productions are followed by a nonunit production. Note that any 
nonunit production that is not preceded by a unit production is a “step” by 
itself. Each of these steps can be performed by one production of G1, because 
the construction of G, created exactly the productions that reflect zero or more 


unit productions followed by a nonunit production. Thus, S - w. 
al 
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We can now summarize the various simplifications described so far. We want 
to convert any CFG G into an equivalent CFG that has no useless symbols, 
€-productions, or unit productions. Some care must be taken in the order of 
application of the constructions. A safe order is: 


1. Eliminate e-productions. 
2. Eliminate unit productions. 
3. Eliminate useless symbols. 


You should notice that, just as in Section 7.1.1, where we had to order the 
two steps properly or the result might have useless symbols, we must order the 
three steps above as shown, or the result might still have some of the features 
we thought we were eliminating. 


Theorem 7.14: If Gis a CFG generating a language that contains at least one 
string other than <€, then there is another CFG G; such that L(G,) = L(G)—{e}, 
and G: has no e-productions, unit productions, or useless symbols. 


PROOF: Start by eliminating the e-productions by the method of Section 7.1.3. 
If we then eliminate unit productions by the method of Section 7.1.4, we do 
not introduce any €-productions, since the bodies of the new productions are 
each identical to some body of an old production. Finally, we eliminate useless 
symbols by the method of Section 7.1.1. As this transformation only eliminates 
productions and symbols, never introducing a new production, the resulting 
grammar will still be devoid of e-productions and unit productions. 


7.1.5 Chomsky Normal Form 


We complete our study of grammatical simplifications by showing that every 
nonempty CFL without e has a grammar G in which all productions are in one 
of two simple forms, either: 


1. A— BC, where A, B, and C, are each variables, or 
2. A — a, where A is a variable and a is a terminal. 


Further, G has no useless symbols. Such a grammar is said to be in Chomsky 
Normal Form, or CNF.1 

To put a grammar in CNF, start with one that satisfies the restrictions of 
Theorem 7.14; that is, the grammar has no e-productions, unit productions, 
or useless symbols. Every production of such a grammar is either of the form 
A — a, which is already in a form allowed by CNF, or it has a body of length 
2 or more. Our tasks are to: 

1N. Chomsky is the linguist who first proposed context-free grammars as a way to de- 
scribe natural languages, and who proved that every CFG could be converted to this form. 
Interestingly, CNF does not appear to have important uses in natural linguistics, although 


we shall see it has several other uses, such as an efficient test for membership of a string in a 
context-free language (Section 7.4.4). 
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a) Arrange that all bodies of length 2 or more consist only of variables. 


b) Break bodies of length 3 or more into a cascade of productions, each with 
a body consisting of two variables. 


The construction for (a) is as follows. For every terminal a that appears in 
a body of length 2 or more, create a new variable, say A. This variable has only 
one production, A > a. Now, we use A in place of a everywhere a appears in 
a body of length 2 or more. At this point, every production has a body that is 
either a single terminal or at least two variables and no terminals. 

For step (b), we must break those productions A > Bı B2- -- Bp, for k > 3, 
into a group of productions with two variables in each body. We introduce 
k — 2 new variables, C1, C2,...,Cpk—2. The original production is replaced by 
the k — 1 productions 


A> BiCi, Ci > BoCo,...,Ce-3 > Br-2Cr-2, Crk-2 > Br_-1 By 


Example 7.15: Let us convert the grammar of Example 7.12 to CNF. For 
part (a), notice that there are eight terminals, a, b, 0, 1, +, x, (, and ), each of 
which appears in a body that is not a single terminal. Thus, we must introduce 
eight new variables, corresponding to these terminals, and eight productions in 
which the new variable is replaced by its terminal. Using the obvious initials 
as the new variables, we introduce: 


Avra Bob ZA30 O--1 
Pso+ Max L>( R>) 


If we introduce these productions, and replace every terminal in a body that is 
other than a single terminal by the corresponding variable, we get the grammar 
shown in Fig. 7.2. 


EPT |TMF|LER|a|b|IA|IB|1IZ|IO 
TMF|LER|a|b|IA|IB|IZ|IO 
LER|a|b|IA|IB|IZ|IO 
a|b|IA|IB|1IZ|I0 


Ue be ab cele Aid al as cahe sh 


DoS VONDR TTY 
a en ee) 


Figure 7.2: Making all bodies either a single terminal or several variables 
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Now, all productions are in Chomsky Normal Form except for those with 
the bodies of length 3: EPT, TMF, and LER. Some of these bodies appear in 
more than one production, but we can deal with each body once, introducing 
one extra variable for each. For EPT, we introduce new variable C1, and 
replace the one production, Æ + EPT, where it appears, by E => EC, and 
Cy +> PT. 

For TMF we introduce new variable Cp. The two productions that use this 
body, E > TMF and T —> TMF, are replaced by E > TC2, T > TC, and 
Cə — MF. Then, for LER we introduce new variable C3 and replace the three 
productions that use it, E > LER, T > LER, and F > LER, by E > LCs, 
T > LC3, F + LC3, and C3 > ER. The final grammar, which is in CNF, is 
shown in Fig. 7.3. 


EC, | TC, | LC3|a|b|IA|IB|IZ|1I0 
TC2|LC3|a|b|IA|IB|IZ|IO 
LC3|a|b|IA|IB|1Z| IO 
a|b|IA|IB|IZ|IO 


menses oneal ae A tl ale alongs a ap 
ynon ee) 


QQARMSVONDE NY 


Figure 7.3: Making all bodies either a single terminal or two variables 


Theorem 7.16: If G is a CFG whose language contains at least one string 
other than €, then there is a grammar G1 in Chomsky Normal Form, such that 
L(G1) = L(G) — {e}. 


PROOF: By Theorem 7.14, we can find CFG Go such that L(G2) = L(G) — {e}, 
and such that Gə has no useless symbols, €-productions, or unit productions. 
The construction that converts Gz to CNF grammar Gi changes the produc- 
tions in such a way that each production of Gz can be simulated by one or 
more productions of G,. Conversely, the introduced variables of G each have 
only one production, so they can only be used in the manner intended. More 
formally, we prove that w is in L(G) if and only if w is in L(G). 


(Only-if) If w has a derivation in Gə, it is easy to replace each production 
used, say A + X,X_o---X,, by a sequence of productions of Gi. That is, 
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one step in the derivation in G2 becomes one or more steps in the derivation 
of w using the productions of Gi. First, if any X; is a terminal, we know 
G, has a corresponding variable B; and a production B; — X;. Then, if 
k > 2, G, has productions A > B,C), Cı —> B2C2, and so on, where B; is 
either the introduced variable for terminal X; or X; itself, if X; is a variable. 
These productions simulate in G, one step of a derivation of Gj that uses 
A — XıX2--- Xk. We conclude that there is a derivation of w in G1, so w is 
in L(G). 


(If) Suppose w is in L(G). Then there is a parse tree in G1, with S at the 
root and yield w. We convert this tree to a parse tree of Gə that also has root 
S and yield w. 

First, we “undo” part (b) of the CNF construction. That is, suppose there 
is a node labeled A, with two children labeled Bı and C1, where C, is one of the 
variables introduced in part (b). Then this portion of the parse tree must look 
like Fig. 7.4(a). That is, because these introduced variables each have only one 
production, there is only one way that they can appear, and all the variables 
introduced to handle the production A + B,B,---B, must appear together, 
as shown. 

Any such cluster of nodes in the parse tree may be replaced by the pro- 
duction that they represent. The parse-tree transformation is suggested by 
Fig. 7.4(b). 

The resulting parse tree is still not necessarily a parse tree of Gə. The 
reason is that step (a) in the CNF construction introduced other variables that 
derive single terminals. However, we can identify these in the current parse tree 
and replace a node labeled by such a variable A and its one child labeled a, 
by a single node labeled a. Now, every interior node of the parse tree forms a 
production of G2. Since w is the yield of a parse tree in G2, we conclude that 
w is in L(G). 


7.1.6 Exercises for Section 7.1 


* Exercise 7.1.1: Find a grammar equivalent to 
S > AB|CA 
A > a 
B > BC|AB 
C > aBlb 


with no useless symbols. 


* Exercise 7.1.2: Begin with the grammar: 


S > ASBle 
A > aAS|a 
B — SbS|A| bb 
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(a) 


A 
BA P FSS 
B, B, aks B 
A L\ /\ 


(b) 


Figure 7.4: A parse tree in G must use introduced variables in a special way 
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Greibach Normal Form 


There is another interesting normal form for grammars that we shall not 
prove. Every nonempty language without € is L(G) for some grammar G 
each of whose productions is of the form A — aa, where a is a terminal 
and a is a string of zero or more variables. Converting a grammar to 
this form is complex, even if we simplify the task by, say, starting with a 
Chomsky-Normal-Form grammar. Roughly, we expand the first variable 
of each production, until we get a terminal. However, because there can 
be cycles, where we never reach a terminal, it is necessary to “short- 
circuit” the process, creating a production that introduces a terminal as 
the first symbol of the body and has variables following it to generate all 
the sequences of variables that might have been generated on the way to 
generation of that terminal. 

This form, called Greibach Normal Form, after Sheila Greibach, who 
first gave a way to construct such grammars, has several interesting con- 
sequences. Since each use of a production introduces exactly one terminal 
into a sentential form, a string of length n has a derivation of exactly 
n steps. Also, if we apply the PDA construction of Theorem 6.13 to 
a Greibach-Normal-Form grammar, then we get a PDA with no e-rules, 
thus showing that it is always possible to eliminate such transitions of a 
PDA. 


a) Eliminate e-productions. 


b 


Eliminate any unit productions in the resulting grammar. 


c) Eliminate any useless symbols in the resulting grammar. 


) 
) 
) 
d) Put the resulting grammar into Chomsky Normal Form. 


Exercise 7.1.3: Repeat Exercise 7.1.2 for the following grammar: 


S - 0A0|1B1| BB 


A > C 
B > S|A 
C => Sje 


Exercise 7.1.4: Repeat Exercise 7.1.2 for the following grammar: 


S > AAA|B 
A > aA|B 
B > e 


Exercise 7.1.5: Repeat Exercise 7.1.2 for the following grammar: 


* 
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S — aAa| bBb|e 
A > Cla 

B > Clb 

C > CDE|e 
D > A|B|ab 


Exercise 7.1.6: Design a CNF grammar for the set of strings of balanced 
parentheses. You need not start from any particular non-CNF grammar. 


Exercise 7.1.7: Suppose G is a CFG with p productions, and no production 
body longer than n. Show that if A > e€, then there is a derivation of e€ from 


A of no more than (n? — 1)/(n — 1) steps. How close can you actually come to 
this bound? 


Exercise 7.1.8: Let G be an e-production-free grammar whose total length of 
production bodies is n. We convert G to CNF. 


a) Show that the CNF grammar has at most O(n”) productions. 


b) Show that it is possible for the CNF grammar to have a number of produc- 
tions proportional to n?. Hint: Consider the construction that eliminates 
unit productions. 


Exercise 7.1.9: Provide the inductive proofs needed to complete the following 
theorems: 


a) The part of Theorem 7.4 where we show that discovered symbols really 
are generating. 


b) Both directions of Theorem 7.6, where we show the correctness of the 
algorithm in Section 7.1.2 for detecting the reachable symbols. 


c) The part of Theorem 7.11 where we show that all pairs discovered really 
are unit pairs. 


Exercise 7.1.10: Is it possible to find, for every context-free language without 
€, a grammar such that all its productions are either of the form A > BCD 
(i.e., a body consisting of three variables), or A — a (i.e., a body consisting of 
a single terminal)? Give either a proof or a counterexample. 


Exercise 7.1.11: In this exercise, we shall show that for every context-free lan- 
guage L containing at least one string other than e, there is a CFG in Greibach 
normal form that generates L— {4e}. Recall that a Greibach normal form (GNF) 
grammar is one where every production body starts with a terminal. The con- 
struction will be done using a series of lemmas and constructions. 


a) Suppose that a CFG G has a production A > aBZ, and all the produc- 
tions for B are B > yı | %2 | +++ | n. Then if we replace A + aBG by 
all the productions we get by substituting some body of a B-production 
for B, that is, A > ayıb | aß | --- | anb, the resulting grammar 
generates the same language as G. 


7.2. THE PUMPING LEMMA FOR CONTEXT-FREE LANGUAGES 279 


In what follows, assume that the grammar G for L is in Chomsky Normal Form, 
and that the variables are called A1, Ao,..., Ap. 


*! b) Show that, by repeatedly using the transformation of part (a), we can 
convert G to an equivalent grammar in which every production body for 
A; either starts with a terminal or starts with Aj, for some j > i. In either 
case, all symbols after the first in any production body are variables. 


! c) Suppose G, is the grammar that we get by performing step (b) on G. 
Suppose that A; is any variable, and let A > Aja; | + | Ajam be all 
the A,-productions that have a body beginning with A;. Let 

Ai > fi | + | 8p 


be all the other A;-productions. Note that each 8; must start with either a 
terminal or a variable with index higher than j. Introduce a new variable 
B;, and replace the first group of m productions by 


Ai > Bı Bi | = | BpBi 
Bi > aı B; | ay | oie | QmB; | Om 


Prove that the resulting grammar generates the same language as G and 
Gi. 


*! d) Let G2 be the grammar that results from step (c). Note that all the A; 
productions have bodies that begin with either a terminal or an A, for 
j >i. Also, all the B; productions have bodies that begin with either a 
terminal or some A;. Prove that Gə has an equivalent grammar in GNF. 
Hint: First fix the productions for Ag, then A,z_;, and so on, down to 
Aj, using part (a). Then fix the B; productions in any order, again using 


part (a). 


ww 


Exercise 7.1.12: Use the construction of Exercise 7.1.11 to convert the gram- 
mar 


S > AA|0 
A > SS|1 


to GNF. 


7.2 The Pumping Lemma for Context-Free 
Languages 
Now, we shall develop a tool for showing that certain languages are not context- 


free. The theorem, called the “pumping lemma for context-free languages,” says 
that in any sufficiently long string in a CFL, it is possible to find at most two 
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short, nearby substrings, that we can “pump” in tandem. That is, we may 
repeat both of the strings i times, for any integer i, and the resulting string will 
still be in the language. 

We may contrast this theorem with the analogous pumping lemma for 
regular languages, Theorem 4.1, which says we can always find one small 
string to pump. The difference is seen when we consider a language like 
L = {0"1" | n > 1}. We can show it is not regular, by fixing n and pumping a 
substring of 0’s, thus getting a string with more 0’s than 1’s. However, the CFL 
pumping lemma states only that we can find two small strings, so we might be 
forced to use a string of 0’s and a string of 1’s, thus generating only strings in 
L when we “pump.” That outcome is fortunate, because L is a CFL, and thus 
we should not be able to use the CFL pumping lemma to construct strings not 
in L. 


7.2.1 The Size of Parse Trees 


Our first step in deriving a pumping lemma for CFL’s is to examine the shape 
and size of parse trees. One of the uses of CNF is to turn parse trees into 
binary trees. These trees have some convenient properties, one of which we 
exploit here. 


Theorem 7.17: Suppose we have a parse tree according to a Chomsky-Nor- 
mal-Form grammar G = (V,T, P, S), and suppose that the yield of the tree is 
a terminal string w. If the length of the longest path is n, then |w| < 2”71. 


PROOF: The proof is a simple induction on n. 


BASIS: n = 1. Recall that the length of a path in a tree is the number of edges, 
i.e., one less than the number of nodes. Thus, a tree with a maximum path 
length of 1 consists of only a root and one leaf labeled by a terminal. String w 
is this terminal, so |w| = 1. Since 2”~! = 2° = 1 in this case, we have proved 
the basis. 


INDUCTION: Suppose the longest path has length n, and n > 1. The root of 
the tree uses a production, which must be of the form A + BC, since n > 1; 
i.e., we could not start the tree using a production with a terminal. No path 
in the subtrees rooted at B and C can have length greater than n — 1, since 
these paths exclude the edge from the root to its child labeled B or C. Thus, 
by the inductive hypothesis, these two subtrees each have yields of length at 
most 2”—?. The yield of the entire tree is the concatenation of these two yields, 
and therefore has length at most 2”7? + 2”-? = 2”—!. Thus, the inductive step 
is proved. 


7.2.2 Statement of the Pumping Lemma 


The pumping lemma for CFL’s is quite similar to the pumping lemma for 
regular languages, but we break each string z in the CFL L into five parts, and 
we pump the second and fourth, in tandem. 
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Theorem 7.18: (The pumping lemma for context-free languages) Let L be 
a CFL. Then there exists a constant n such that if z is any string in L such 
that |z| is at least n, then we can write z = uvway, subject to the following 
conditions: 


1. juwa| <n. That is, the middle portion is not too long. 


2. vz #e. Since v and z are the pieces to be “pumped,” this condition says 
that at least one of the strings we pump must not be empty. 


3. For all i > 0, uv’wa'y is in L. That is, the two strings v and « may be 
“pumped” any number of times, including 0, and the resulting string will 
still be a member of L. 


PROOF: Our first step is to find a Chomsky-Normal-Form grammar G for L. 
Technically, we cannot find such a grammar if L is the CFL @ or {e}. However, 
if L = @ then the statement of the theorem, which talks about a string z in L 
surely cannot be violated, since there is no such z in Ø. Also, the CNF grammar 
G will actually generate L — {e}, but that is again not of importance, since we 
shall surely pick n > 0, in which case z cannot be € anyway. 

Now, starting with a CNF grammar G = (V,7,P,S) such that L(G) = 
L — {e}, let G have m variables. Choose n = 2”. Next, suppose that z in L is 
of length at least n. By Theorem 7.17, any parse tree whose longest path is of 
length m or less must have a yield of length 2™~1! = n/2 or less. Such a parse 
tree cannot have yield z, because z is too long. Thus, any parse tree with yield 
z has a path of length at least m + 1. 


Figure 7.5: Every sufficiently long string in L must have a long path in its parse 
tree 


Figure 7.5 suggests the longest path in the tree for z, where k is at least m 
and the path is of length k+1. Since k > m, there are at least m+1 occurrences 
of variables Ag, 41,..., Ag on the path. As there are only m different variables 
in V, at least two of the last m + 1 variables on the path (that is, Ap_m 
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through Ax, inclusive) must be the same variable. Suppose A; = Aj, where 
k-m<i<j<k. 


Figure 7.6: Dividing the string w so it can be pumped 


Then it is possible to divide the tree as shown in Fig. 7.6. String w is the 
yield of the subtree rooted at Aj. Strings v and zx are the strings to the left and 
right, respectively, of w in the yield of the larger subtree rooted at A;. Note 
that, since there are no unit productions, v and x could not both be e, although 
one could be. Finally, u and y are those portions of z that are to the left and 
right, respectively, of the subtree rooted at A,. 


If A; = A; = A, then we can construct new parse trees from the original 
tree, as suggested in Fig. 7.7(a). First, we may replace the subtree rooted 
at A;, which has yield vwg, by the subtree rooted at Aj, which has yield w. 
The reason we can do so is that both of these trees have root labeled A. The 
resulting tree is suggested in Fig. 7.7(b); it has yield uwy and corresponds to 
the case i = 0 in the pattern of strings uv‘wa'ty. 

Another option is suggested by Fig. 7.7(c). There, we have replaced the 
subtree rooted at A; by the entire subtree rooted at A;. Again, the justification 
is that we are substituting one tree with root labeled A for another tree with 
the same root label. The yield of this tree is uv?wg?y. Were we to then replace 
the subtree of Fig. 7.7(c) with yield w by the larger subtree with yield vwx, we 
would have a tree with yield wv°wa*y, and so on, for any exponent i. Thus, 
there are parse trees in G for all strings of the form uv'wa'y, and we have 
almost proved the pumping lemma. 

The remaining detail is condition (1), which says that |vwg| < n. However, 
we picked A; to be close to the bottom of the tree; that is, k— i < m. Thus, 
the longest path in the subtree rooted at A; is no greater than m+ 1. By 
Theorem 7.17, the subtree rooted at A; has a yield whose length is no greater 
than 2” =n. 


7.2. THE PUMPING LEMMA FOR CONTEXT-FREE LANGUAGES 283 


(a) 


(b) 


(c) 


Figure 7.7: Pumping strings v and x zero times and pumping them twice 


7.2.3 Applications of the Pumping Lemma for CFL’s 


Notice that, like the earlier pumping lemma for regular languages, we use the 
CFL pumping lemma as an “adversary game,” as follows. 


1. 


2. 


We pick a language L that we want to show is not a CFL. 


Our “adversary” gets to pick n, which we do not know, and we therefore 
must plan for any possible n. 


. We get to pick z, and may use n as a parameter when we do so. 


. Our adversary gets to break z into uvwgy, subject only to the constraints 


that juwa| < n and va # e. 


. We “win” the game, if we can, by picking i and showing that uv'wa'y is 


not in L. 


We shall now see some examples of languages that we can prove, using the 
pumping lemma, not to be context-free. Our first example shows that, while 
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context-free languages can match two groups of symbols for equality or inequal- 
ity, they cannot match three such groups. 


Example 7.19: Let L be the language {0"1"2” | n > 1}. That is, L consists of 
all strings in 0°172* with an equal number of each symbol, e.g., 012, 001122, 
and so on. Suppose L were context-free. Then there is an integer n given to us 
by the pumping lemma.” Let us pick z = 0"1"2”. 

Suppose the “adversary” breaks z as z = uuway, where |vwa| < n and v 
and x are not both e. Then we know that vwx cannot involve both 0’s and 
2’s, since the last 0 and the first 2 are separated by n + 1 positions. We shall 
prove that L contains some string known not to be in L, thus contradicting the 
assumption that L is a CFL. The cases are as follows: 


1. vwa has no 2’s. Then vx consists of only 0’s and 1’s, and has at least 
one of these symbols. Then uwy, which would have to be in L by the 
pumping lemma, has n 2’s, but has fewer than n 0’s or fewer than n 1’s, 
or both. It therefore does not belong in L, and we conclude L is not a 
CFL in this case. 


2. vwg has no 0’s. Similarly, uwy has n 0’s, but fewer 1’s or fewer 2’s. It 
therefore is not in L. 


Whichever case holds, we conclude that L has a string we know not to be in L. 
This contradiction allows us to conclude that our assumption was wrong; L is 
not a CFL. 


Another thing that CFL’s cannot do is match two pairs of equal numbers 
of symbols, provided that the pairs interleave. The idea is made precise in the 
following example of a proof of non-context-freeness using the pumping lemma. 


Example 7.20: Let L be the language {01/23 | i > 1 and j > 1}. If Lis 
context-free, let n be the constant for L, and pick z = 0"1"2"3”". We may write 
z = uvwgy subject to the usual constraints |vwg| < n and va # e. Then vwg 
is either contained in the substring of one symbol, or it straddles two adjacent 
symbols. 

If vwx consists of only one symbol, then uwy has n of three different symbols 
and fewer than n of the fourth symbol. Thus, it cannot be in L. If vwa straddles 
two symbols, say the 1’s and 2’s, then uwy is missing either some 1’s or some 
2’s, or both. Suppose it is missing 1’s. As there are n 3’s, this string cannot 
be in L. Similarly, if it is missing 2’s, then as it has n 0’s, uwy cannot be in L. 
We have contradicted the assumption that L is a CFL and conclude that it is 
not. 


As a final example, we shall show that CFL’s cannot match two strings 
of arbitrary length, if the strings are chosen from an alphabet of more than 


?Remember that this n is the constant provided by the pumping lemma, and it has nothing 
to do with the local variable n used in the definition of L itself. 
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one symbol. An implication of this observation, incidentally, is that grammars 
are not a suitable mechanism for enforcing certain “semantic” constraints in 
programming languages, such as the common requirement that an identifier be 
declared before use. In practice, another mechanism, such as a “symbol table” 
is used to record declared identifiers, and we do not try to design a parser that, 
by itself, checks for “definition prior to use.” 


Example 7.21: Let L = {ww | wisin {0,1}*}. That is, L consists of repeating 
strings, such as e, 0101, 00100010, or 110110. If L is context-free, then let n be 
its pumping-lemma constant. Consider the string z = 0°1"0"1”. This string is 
0"1” repeated, so z is in L. 

Following the pattern of the previous examples, we can break z = uvwzy, 
such that |uwa| < n and va # e. We shall show that uwy is not in L, and thus 
show L not to be a context-free language, by contradiction. 

First, observe that, since |vwgz| < n, |uwy| > 3n. Thus, if uwy is some 
repeating string, say tt, then t is of length at least 3n/2. There are several 
cases to consider, depending where vwz is within z. 


1. Suppose vw is within the first n 0’s. In particular, let vx consist of k 
0’s, where k > 0. Then uwy begins with 0”7#1”. Since |uwy| = 4n — k, 
we know that if uwy = tt, then |t| = 2n —k/2. Thus, t does not end until 
after the first block of 1’s; i.e., t ends in 0. But uwy ends in 1, and so it 
cannot equal tt. 


2. Suppose vwa straddles the first block of 0’s and the first block of 1’s. It 
may be that vx consists only of 0’s, if x = e. Then, the argument that 
uwy is not of the form tt is the same as case (1). If væ has at least one 
1, then we note that t, which is of length at least 3n/2, must end in 1”, 
because uwy ends in 1”. However, there is no block of n 1’s except the 
final block, so t cannot repeat in uwy. 


3. If vwx is contained in the first block of 1’s, then the argument that uwy 
is not in L is like the second part of case (2). 


4. Suppose vwz straddles the first block of 1’s and the second block of 0’s. 
If vx actually has no 0’s, then the argument is the same as if vwx were 
contained in the first block of 1’s. If vg has at least one 0, then uwy starts 
with a block of n 0’s, and so does t if uwy = tt. However, there is no 
other block of n 0’s in uwy for the second copy of t. We conclude in this 
case too, that uwy is not in L. 


5. In the other cases, where vwz is in the second half of z, the argument is 
symmetric to the cases where vwz is contained in the first half of z. 


Thus, in no case is uwy in L, and we conclude that L is not context-free. 


286 CHAPTER 7. PROPERTIES OF CONTEXT-FREE LANGUAGES 


7.2.4 Exercises for Section 7.2 


Exercise 7.2.1: Use the CFL pumping lemma to show each of these languages 
not to be context-free: 


* a) {abi | i<j <k}. 
b) {abé | i <n}. 


c) {0? | pis a prime}. Hint: Adapt the same ideas used in Example 4.3, 
which showed this language not to be regular. 


*! d) {OF | j aah 
Le) {abé |n <i < 2n}. 


! f) {www | w is a string of 0’s and 1’s}. That is, the set of strings consisting 
of some string w followed by the same string in reverse, and then the string 
w again, such as 001100001. 


Exercise 7.2.2: When we try to apply the pumping lemma to a CFL, the 
“adversary wins,” and we cannot complete the proof. Show what goes wrong 
when we choose L to be one of the following languages: 


a) {00,11}. 
* b) AO [gece 
* c) The set of palindromes over alphabet {0, 1}. 


Exercise 7.2.3: There is a stronger version of the CFL pumping lemma known 
as Ogden’s lemma. It differs from the pumping lemma we proved by allowing 
us to focus on any n “distinguished” positions of a string z and guaranteeing 
that the strings to be pumped have between 1 and n distinguished positions. 
The advantage of this ability is that a language may have strings consisting 
of two parts, one of which can be pumped without producing strings not in 
the language, while the other does produce strings outside the language when 
pumped. Without being able to insist that the pumping take place in the latter 
part, we cannot complete a proof of non-context-freeness. The formal statement 
of Ogden’s lemma is: If L is a CFL, then there is a constant n, such that if z 
is any string of length at least n in L, in which we select at least n positions to 
be distinguished, then we can write z = uvwgy, such that: 


1. vwaz has at most n distinguished positions. 
2. vz has at least one distinguished position. 
3. For all i, uv'wa'y is in L. 


Prove Ogden’s lemma. Hint: The proof is really the same as that of the pump- 
ing lemma of Theorem 7.18 if we pretend that the nondistinguished positions 
of z are not present as we select a long path in the parse tree for z. 
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Exercise 7.2.4: Use Ogden’s lemma (Exercise 7.2.3) to simplify the proof 
in Example 7.21 that L = {ww | w is in {0,1}*} is not a CFL. Hint: With 
z=0"1"0"1", make the two middle blocks distinguished. 


Exercise 7.2.5: Use Ogden’s lemma (Exercise 7.2.3) to show the following 
languages are not CFL’s: 


ta) {010% | j = max(i, k)}. 


!! b) {a"b"c! | i An}. Hint: If n is the constant for Ogden’s lemma, consider 
the string z = abret”! 


7.3 Closure Properties of Context-Free 
Languages 


We shall now consider some of the operations on context-free languages that 
are guaranteed to produce a CFL. Many of these closure properties will parallel 
the theorems we had for regular languages in Section 4.2. However, there are 
some differences. 

First, we introduce an operation called substitution, in which we replace each 
symbol in the strings of one language by an entire language. This operation, a 
generalization of the homomorphism that we studied in Section 4.2.3, is useful in 
proving some other closure properties of CFL’s, such as the regular-expression 
operations: union, concatenation, and closure. We show that CFL’s are closed 
under homomorphisms and inverse homomorphisms. Unlike the regular lan- 
guages, the CFL’s are not closed under intersection or difference. However, the 
intersection or difference of a CFL and a regular language is always a CFL. 


7.3.1 Substitutions 


Let © be an alphabet, and suppose that for every symbol a in £, we choose a 
language La. These chosen languages can be over any alphabets, not necessarily 
and not necessarily the same. This choice of languages defines a function s 
(a substitution) on X, and we shall refer to La as s(a) for each symbol a. 

If w = a1 a2-+- Gy is a string in }*, then s(w) is the language of all strings 
%1%9+++Xpy such that string x; is in the language s(a;), for i = 1,2,...,n. Put 
another way, s(w) is the concatenation of the languages s(a1)s(az) ---s(an). 
We can further extend the definition of s to apply to languages: s(L) is the 
union of s(w) for all strings w in L. 


Example 7.22: Suppose s(0) = {a"b" | n > 1} and s(1) = {aa, bb}. That is, 
s is a substitution on alphabet © = {0,1}. Language s(0) is the set of strings 
with one or more a’s followed by an equal number of b’s, while s(1) is the finite 
language consisting of the two strings aa and bb. 
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Let w = 01. Then s(w) is the concatenation of the languages s(0)s(1). To 
be exact, s(w) consists of all strings of the forms a’b"aa and a"b"+?, where 
n> 1. 

Now, suppose L = L(0*), that is, the set of all strings of 0’s. Then s(L) = 
(s(0))”. This language is the set of all strings of the form 


a”! bria”? pb”? eed q"* por 


for some k > 0 and any sequence of choices of positive integers nj, n2,..., Nk- 
It includes strings such as €, aabbaaabbb, and abaabbabab. 


Theorem 7.23: If L is a context-free language over alphabet ©, and s is a 
substitution on © such that s(a) is a CFL for each a in ©, then s(L) is a CFL. 


PROOF: The essential idea is that we may take a CFG for L and replace each 
terminal a by the start symbol of a CFG for language s(a). The result is a 
single CFG that generates s(L). However, there are a few details that must be 
gotten right to make this idea work. 

More formally, start with grammars for each of the relevant languages, say 
G = (V,»,P,S) for L and Ga = (Va,Ta, Pa, Sa) for each a in ©. Since we 
can choose any names we wish for variables, let us make sure that the sets of 
variables are disjoint; that is, there is no symbol A that is in two or more of 
V and any of the V,’s. The purpose of this choice of names is to make sure 
that when we combine the productions of the various grammars into one set 
of productions, we cannot get accidental mixing of the productions from two 
grammars and thus have derivations that do not resemble the derivations in 
any of the given grammars. 

We construct a new grammar G” = (V',T"’, P', S) for s(L), as follows: 


e V’ is the union of V and all the V,’s for a in Ð. 
e 7" is the union of all the T,’s for a in Ð. 
e P’ consists of: 


1. All productions in any P}, for a in ©. 


2. The productions of P, but with each terminal a in their bodies re- 
placed by S, everywhere a occurs. 


Thus, all parse trees in grammar G” start out like parse trees in G, but instead 
of generating a yield in &*, there is a frontier in the tree where all nodes have 
labels that are Sa for some a in ©. Then, dangling from each such node is a 
parse tree of Ga, whose yield is a terminal string that is in the language s(a). 
The typical parse tree is suggested in Fig. 7.8. 

Now, we must prove that this construction works, in the sense that G"’ 
generates the language s(L). Formally: 


e A string w is in L(G’) if and only if w is in s(L). 
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Figure 7.8: A parse tree in G’ begins with a parse tree in G and finishes with 
many parse trees, each in one of the grammars Ga 


(If) Suppose w is in s(Z). Then there is some string x = a1a2'-'an in L, and 
strings x; in s(a;) for i = 1,2,...,n, such that w = 212%9-+++a,. Then the 
portion of G’ that comes from the productions of G with Sa substituted for 
each a will generate a string that looks like x, but with Sa in place of each a. 
This string is Sa, Sas **' San. This part of the derivation of w is suggested by 
the upper triangle in Fig. 7.8. 

Since the productions of each Ga are also productions of G", the derivation 
of x; from Sa; is also a derivation in G’. The parse trees for these derivations 
are suggested by the lower triangles in Fig. 7.8. Since the yield of this parse 
tree of G' is 7142 +++ %p, = w, we conclude that w is in L(G’). 


(Only-if) Now suppose w is in L(G’). We claim that the parse tree for w 
must look like the tree of Fig. 7.8. The reason is that the variables of each 
of the grammars G and Ga for a in È are disjoint. Thus, the top of the tree, 
starting from variable S, must use only productions of G until some symbol Sa 
is derived, and below that S, only productions of grammar G, may be used. 
As a result, whenever w has a parse tree T, we can identify a string a1a2 -an 
in L(G), and strings x; in language s(a;), such that 


1. w= 41%2°+++ Xn, and 


2. The string S,,5a,+++Sq, is the yield of a tree that is formed from T by 
deleting some subtrees (as suggested by Fig. 7.8). 


But the string 7122 -+-xp is in s(L), since it is formed by substituting strings 
x; for each of the a,;’s. Thus, we conclude w is in s(L). 


7.3.2 Applications of the Substitution Theorem 


There are several familiar closure properties, which we studied for regular lan- 
guages, that we can show for CFL’s using Theorem 7.23. We shall list them all 
in one theorem. 
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Theorem 7.24: The context-free languages are closed under the following 
operations: 


1. Union. 


2. Concatenation. 


3. 


Closure (*), and positive closure (+). 


4. Homomorphism. 


PROOF: Each requires only that we set up the proper substitution. The proofs 
below each involve substitution of context-free languages into other context-free 
languages, and therefore produce CFL’s by Theorem 7.23. 


1. Union: Let Lı and Lə be CFL’s. Then Lı U Lə is the language s(L), 


where L is the language {1, 2}, and s is the substitution defined by s(1) = 
Lı and s(2) = Lə. 


. Concatenation: Again let Lı and Lz be CFL’s. Then Li L is the language 


s(L), where L is the language {12}, and s is the same substitution as in 
case (1). 


. Closure and positive closure: If Lı is a CFL, L is the language {1}*, and 


s is the substitution s(1) = Lı, then Li = s(L). Similarly, if L is instead 
the language {1}+, then LY = s(L). 


. Suppose L is a CFL over alphabet “, and h is a homomorphism on ©. Let 


s be the substitution that replaces each symbol a in © by the language 
consisting of the one string that is h(a). That is, s(a) = {h(a)}, for all a 
in ©. Then A(L) = s(L). 


7.3.3 Reversal 


The CFL’s are also closed under reversal. We cannot use the substitution 
theorem, but there is a simple construction using grammars. 


Theorem 7.25: If L is a CFL, then so is L®. 


PROOF: Let L = L(G) for some CFL G = (V,T,P,S). Construct GË = 
(V,T,P®,S), where PË is the “reverse” of each production in P. That is, if 
A — a is a production of G, then A > aë is a production of G®. It is an easy 
induction on the lengths of derivations in G and G? to show that L(G”) = LE. 
Essentially, all the sentential forms of GF are reverses of sentential forms of G, 


and vice-versa. We leave the formal proof as an exercise. 
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7.3.4 Intersection With a Regular Language 


The CFL’s are not closed under intersection. Here is a simple example that 
proves they are not. 


Example 7.26: We learned in Example 7.19 that the language 
L= {0"1"2" |n > 1} 


is not a context-free language. However, the following two languages are con- 
text-free: 


Ly = {0"1"2' |n >1,i>1} 
Lo = {011"2" |n>1,i>1} 


A grammar for L is: 


S —> AB 
A => 0Al1 | 01 
B+2B|2 


In this grammar, A generates all strings of the form 0"1”, and B generates all 
strings of 2’s. A grammar for Lz is: 


S— AB 
A>0A|0 
B > 1B2 | 12 


It works similarly, but with A generating any string of 0’s, and B generating 
matching strings of 1’s and 2’s. 

However, L = Lı N Lə. To see why, observe that Lı requires that there be 
the same number of 0’s and 1’s, while Lə requires the numbers of 1’s and 2’s 
to be equal. A string in both languages must have equal numbers of all three 
symbols and thus be in L. 

If the CFL’s were closed under intersection, then we could prove the false 
statement that L is context-free. We conclude by contradiction that the CFL’s 
are not closed under intersection. 


On the other hand, there is a weaker claim we can make about intersection. 
The context-free languages are closed under the operation of “intersection with 
a regular language.” The formal statement and proof is in the next theorem. 


Theorem 7.27: If L is a CFL and R is a regular language, then LN Risa 
CFL. 
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state 
Input Bi m Accept/ 
reject 


Stack 


Figure 7.9: A PDA and a FA can run in parallel to create a new PDA 


PROOF: This proof requires the pushdown-automaton representation of CFL’s, 

as well as the finite-automaton representation of regular languages, and gener- 

alizes the proof of Theorem 4.8, where we ran two finite automata “in parallel” 

to get the intersection of their languages. Here, we run a finite automaton “in 

parallel” with a PDA, and the result is another PDA, as suggested in Fig. 7.9. 
Formally, let 


P= (Qp, ,T, dp, QP, Zo, Fp) 
be a PDA that accepts L by final state, and let 


A= (Qa, £, 6a, qa, Fa) 


be a DFA for R. Construct PDA 
P: = (QP x Qa,d,T,6, (qr, qa), Zo, FP x Fa) 


where ô((q, p), a, X) is defined to be the set of all pairs ((r, s), y) such that: 


A 


1. s = ĉa (p,a), and 
2. Pair (r, y) is in ôp(q,a, X). 


That is, for each move of PDA P, we can make the same move in PDA P’, and 
in addition, we carry along the state of the DFA A in a second component of 
the state of P'. Note that a may be a symbol of X, or a = e. In the former 
case, ô(p, a) = ôa (p,a), while if a = e, then ô(p, a) = p; i.e., A does not change 
state while P makes moves on e€ input. 

It is an easy induction on the numbers of moves made by the PDA’s that 
(qp, w, Zo) Š (q,€,7) if and only if ((qp,qa),w, Zo) F ((q,p),€,7), where 


1 
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p=ô (qa, w). We leave these inductions as exercises. Since (q, p) is an accepting 
state of P' if and only if q is an accepting state of P, and p is an accepting state 
of A, we conclude that P’ accepts w if and only if both P and A do; i.e., w is 
in LA R. 


Example 7.28: In Fig. 6.6 we designed a PDA called F to accept by final 
state the set of strings of i’s and e’s that represent minimal violations of the 
rule regarding how if’s and else’s may appear in C programs. Call this language 
L. The PDA F was defined by 


Pr=({p.ar}, {i,e}, {4, Xo}, ôr, p, Xo, {r}) 
where ôp consists of the rules: 
1. dr(p, €, Xo) = {(4, ZXo)}. 
2. dr(q,t, Z) = {(q, ZZ) }. 
3. dr(q,e, Z) = {(4,€)}. 
4. dr (q,€, Xo) = {(r,€)}- 
Now, let us introduce a finite automaton 
A= ({s,t}, {ie}, 54,8, {8,t}) 


that accepts the strings in the language of i*e*, that is, all strings of i’s followed 
by e’s. Call this language R. Transition function 6,4 is given by the rules: 


a) ĝa (s,i) = 8. 
b) da(s,e) =t. 
c) ba(t,e) =t. 


Strictly speaking, A is not a DFA, as assumed in Theorem 7.27, because it is 
missing a dead state for the case that we see input i when in state t. However, 
the same construction works even for an NFA, since the PDA that we construct. 
is allowed to be nondeterministic. In this case, the constructed PDA is actually 
deterministic, although it will “die” on certain sequences of input. 

We shall construct a PDA 


P= ({p,a,r} x {s, t}, {i,e}, {Z, Xo}, ô, (p, s), Xo, {r} x {s,t}) 


The transitions of 6 are listed below and indexed by the rule of PDA F (a 
number from 1 to 4) and the rule of DFA A (a letter a, 6, or c) that gives rise 
to the rule. In the case that the PDA F makes an e-transition, there is no rule 
of A used. Note that we construct these rules in a “lazy” way, starting with the 
state of P that is the start states of F and A, and constructing rules for other 
states only if we discover that P can enter that pair of states. 
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6((p,8),€, Xo) = {((4,5), ZXo) }. 
2a: ô((q,s),i, Z) ={((as),ZZ)}. 
5((q,5),e,Z) = {((a,t), 6}. 


4: 5((q,8),€,Xo0) = {((r,s),€)}. Note: one can prove that this rule is never 
exercised. The reason is that it is impossible to pop the stack without 
seeing an e, and as soon as P sees an e the second component of its state 
becomes t. 


3c: 6((q,t),e, Z) = {((q,t), €) }. 
4: 5((q,t),€, Xo) = {((r, t) €)}- 


The language LN R is the set of strings with some number of i’s followed by 
one more e, that is, {i”e"t! | n > 0}. This set is exactly those if-else violations 
that consist of a block of if’s followed by a block of else’s. The language is 
evidently a CFL, generated by the grammar with productions S —> iSe | e. 

Note that the PDA P accepts this language L N R. After pushing Z onto 
the stack, it pushes more Z’s onto the stack in response to inputs i, staying in 
state (q,s). As soon as it sees an e, it goes to state (q,t) and starts popping 
the stack. It dies if it sees an i until Xo is exposed on the stack. At that point, 
it spontaneously transitions to state (r,t) and accepts. 


Since we know that the CFL’s are not closed under intersection, but are 
closed under intersection with a regular language, we also know about the set- 
difference and complementation operations on CFL’s. We summarize these 
properties in one theorem. 


Theorem 7.29: The following are true about CFL’s L, Lı, and Lə, and a 
regular language R. 


1. L— Ris a context-free language. 
2. L is not necessarily a context-free language. 
3. Lı — Lə is not necessarily context-free. 


PROOF: For (1), note that L — R= LN R. If R is regular, so is R regular by 
Theorem 4.5. Then L — R is a CFL by Theorem 7.27. 
For (2), suppose that L is always context-free when L is. Then since 


brO Ly bi O Dz 


and the CFL’s are closed under union, it would follow that the CFL’s are closed 

under intersection. However, we know they are not from Example 7.26. 
Lastly, let us prove (3). We know ©* is a CFL for every alphabet ©; de- 

signing a grammar or PDA for this regular language is easy. Thus, if Lı — Lə 
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were always a CFL when L; and Lz are, it would follow that 4* — L was always 
a CFL when L is. However, ©* — L is L when we pick the proper alphabet 
£. Thus, we would contradict (2) and we have proved by contradiction that 
Lı — Lə is not necessarily a CFL. 


7.3.5 Inverse Homomorphism 


Let us review from Section 4.2.4 the operation called “inverse homomorphism.” 
If h is a homomorphism, and L is any language, then h~!(L) is the set of 
strings w such that h(w) is in L. The proof that regular languages are closed 
under inverse homomorphism was suggested in Fig. 4.6. There, we showed how 
to design a finite automaton that processes its input symbols a by applying a 
homomorphism A to it, and simulating another finite automaton on the sequence 
of inputs h(a). 

We can prove this closure property of CFL’s in much the same way, by using 
PDA’s instead of finite automata. However, there is one problem that we face 
with PDA’s that did not arise when we were dealing with finite automata. The 
action of a finite automaton on a sequence of inputs is a state transition, and 
thus looks, as far as the constructed automaton is concerned, just like a move 
that a finite automaton might make on a single input symbol. 

When the automaton is a PDA, in contrast, a sequence of moves might not 
look like a move on one input symbol. In particular, in n moves, the PDA can 
pop n symbols off its stack, while one move can only pop one symbol. Thus, 
the construction for PDA’s that is analogous to Fig. 4.6 is somewhat more 
complex; it is sketched in Fig. 7.10. The key additional idea is that after input 
a is read, h(a) is placed in a “buffer.” The symbols of h(a) are used one at 
a time, and fed to the PDA being simulated. Only when the buffer is empty 
does the constructed PDA read another of its input symbols and apply the 
homomorphism to it. We shall formalize this construction in the next theorem. 


Theorem 7.30: Let L be a CFL and h a homomorphism. Then h7!(Z) is a 
CFL. 


PROOF: Suppose h applies to symbols of alphabet © and produces strings in 
T*. We also assume that L is a language over alphabet T. As suggested above, 
we start with a PDA P = (Q,T,T,6,q0, Zo, F) that accepts L by final state. 
We construct a new PDA 


P' = (Q', X, T, 6’, (q0, €), Zo, F x {e}) (7.1) 
where: 
1. Q' is the set of pairs (q, x) such that: 


(a) q is a state in Q, and 


(b) x is a suffix (not necessarily proper) of some string h(a) for some 
input symbol a in X. 
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Buffer 
a h(a) 
Input =| h =: 
y 
PDA » Accept/ 
state reject 


Stack 


Figure 7.10: Constructing a PDA to accept the inverse homomorphism of what 
a given PDA accepts 


That is, the first component of the state of P’ is the state of P, and the 
second component is the buffer. We assume that the buffer will period- 
ically be loaded with a string h(a), and then allowed to shrink from the 
front, as we use its symbols to feed the simulated PDA P. Note that since 
X is finite, and h(a) is finite for all a, there are only a finite number of 
states for P’. 


2. 6’ is defined by the following rules: 


(a) 6'((q,€),a,X) = CLORIN for all symbols a in ©, all states 
qin Q, and stack symbols X in I. Note that a cannot be e here. 
When the buffer is empty, P’ can consume its next input symbol a 
and place h(a) in the buffer. 


(b) If 6(q, 6, X) contains (p, y), where b is in T or b = e, then 
ô’ ((q, bx), E, X) 


contains ((p, x), y). That is, P’ always has the option of simulating 
a move of P, using the front of its buffer. If b is a symbol in T, then 
the buffer must not be empty, but if b = e, then the buffer can be 
empty. 


3. Note that, as defined in (7.1), the start state of P’ is (qo, €); i.e., P’ starts 
in the start state of P with an empty buffer. 


4. Likewise, the accepting states of P’, as per (7.1), are those states (q, €) 
such that q is an accepting state of P. 


The following statement characterizes the relationship between P’ and P: 
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e (qo, h(w), Zo) F (p,€,7) if and only if (qo, €), w, Zo) E ((p, €), €, y). 


The proofs in both directions are inductions on the number of moves made by 
the two automata. In the “if” portion, one needs to observe that once the buffer 
of P' is nonempty, it cannot read another input symbol and must simulate P, 
until the buffer has become empty (although when the buffer is empty, it may 
still simulate P). We leave further details as an exercise. 

Once we accept this relationship between P’ and P, we note that P accepts 
h(w) if and only if P’ accepts w, because of the way the accepting states of P’ 
are defined. Thus, L(P’) = h7!(L(P)). 


7.3.6 Exercises for Section 7.3 


Exercise 7.3.1: Show that the CFL’s are closed under the following opera- 
tions: 


* a) init, defined in Exercise 4.2.6(c). Hint: Start with a CNF grammar for 
the language L. 


*! b) The operation L/a, defined in Exercise 4.2.2. Hint: Again, start with a 
CNF grammar for L. 


1! c) cycle, defined in Exercise 4.2.11. Hint: Try a PDA-based construction. 
Exercise 7.3.2: Consider the following two languages: 


Lı =fath"e™ | n,m > 0} 
Lo = {ab P™ | n,m > 0} 


a) Show that each of these languages is context-free by giving grammars for 
each. 


! b) Is Lı N La a CFL? Justify your answer. 


! Exercise 7.3.3: Show that the CFL’s are not closed under the following op- 


erations: 
* a) min, as defined in Exercise 4.2.6(a). 
b) maz, as defined in Exercise 4.2.6(b). 
c) half, as defined in Exercise 4.2.8. 
) 


d) alt, as defined in Exercise 4.2.7. 


Exercise 7.3.4: The shuffle of two strings w and z is the set of all strings that 
one can get by interleaving the positions of w and x in any way. More precisely, 
shuffle(w,x) is the set of strings z such that 


1. Each position of z can be assigned to w or x, but not both. 
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2. The positions of z assigned to w form w when read from left to right. 
3. The positions of z assigned to x form x when read from left to right. 


For example, if w = 01 and x = 110, then shuffle(01, 110) is the set of strings 
{01110, 01101, 10110, 10101, 11010, 11001}. To illustrate the necessary reason- 
ing, the fourth string, 10101, is justified by assigning the second and fifth po- 
sitions to 01 and positions one, three, and four to 110. The first string, 01110, 
has three justifications. Assign the first position and either the second, third, 
or fourth to 01, and the other three to 110. We can also define the shuffle of 
languages, shuffle(L,, L2), to be the union over all pairs of strings, w from Lı 
and x from La, of shuffle(w, x). 


a) What is shuffle(00, 111)? 
*b 


wa 


What is shuffle(L1, Lə) if Lı = L(0*) and Ly = {0"1" | n > 0}? 
*!c) Show that if Lı and Lə are both regular languages, then so is 
shuffle( Lı, L2) 
Hint: Start with DFA’s for Lı and L». 
ld 


NS 


Show that if L is a CFL and R is a regular language, then shuffle(L, R) 
is a CFL. Hint: start with a PDA for L and a DFA for R. 


!! e) Give a counterexample to show that if Lı and Lə are both CFL’s, then 
shuffle(L,, L2) need not be a CFL. 


*!! Exercise 7.3.5: A string y is said to be a permutation of the string x if the 
symbols of y can be reordered to make x. For instance, the permutations 
of string x = O11 are 110, 101, and 011. If L is a language, then perm(L) 
is the set of strings that are permutations of strings in L. For example, if 
L = {0"1" | n > 0}, then perm(L) is the set of strings with equal numbers of 
0’s and L’s. 


a) Give an example of a regular language L over alphabet {0,1} such that 
perm(L) is not regular. Justify your answer. Hint: Try to find a regular 
language whose permutations are all strings with an equal number of 0’s 
and 1’s. 


b) Give an example of a regular language L over alphabet {0,1,2} such that 
perm(L) is not context-free. 


c) Prove that for every regular language L over a two-symbol alphabet, 
perm(L) is context-free. 


Exercise 7.3.6: Give the formal proof of Theorem 7.25: that the CFL’s are 
closed under reversal. 
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Exercise 7.3.7: Complete the proof of Theorem 7.27 by showing that 


(qp, Ww, Zo) F (q, €, y) 


if and only if (ap, qa), w, Zo) É ((a,p), €,), where p= 5(pa,w). 


7.4 Decision Properties of CFL’s 


Now, let us consider what kinds of questions we can answer about context-free 
languages. In analogy with Section 4.3 about decision properties of the regular 
languages, our starting point for a question is always some representation of a 
CFL — either a grammar or a PDA. Since we know from Section 6.3 that we 
can convert between grammars and PDA’s, we may assume we are given either 
representation of a CFL, whichever is more convenient. 

We shall discover that very little can be decided about a CFL; the major 
tests we are able to make are whether the language is empty and whether a given 
string is in the language. We thus close the section with a brief discussion of the 
kinds of problems that we shall later show (in Chapter 9) are “undecidable,” 
i.e., they have no algorithm. We begin this section with some observations 
about the complexity of converting between the grammar and PDA notations 
for a language. These calculations enter into any question of how efficiently we 
can decide a property of CFL’s with a given representation. 


7.4.1 Complexity of Converting Among CFG’s and PDA’s 


Before proceeding to the algorithms for deciding questions about CFL’s, let 
us consider the complexity of converting from one representation to another. 
The running time of the conversion is a component of the cost of the decision 
algorithm whenever the language is given in a form other than the one for which 
the algorithm is designed. 

In what follows, we shall let n be the length of the entire representation of 
a PDA or CFG. Using this parameter as the representation of the size of the 
grammar or automaton is “coarse,” in the sense that some algorithms have a 
running time that could be described more precisely in terms of more specific 
parameters, such as the number of variables of a grammar or the sum of the 
lengths of the stack strings that appear in the transition function of a PDA. 

However, the total-length measure is sufficient to distinguish the most im- 
portant issues: is an algorithm linear in the length (i.e., does it take little more 
time than it takes to read its input), is it exponential in the length (i.e., you can 
perform the conversion only for rather small examples), or is it some nonlinear 
polynomial (i.e., you can run the algorithm, even for large examples, but the 
time is often quite significant). 

There are several conversions we have seen so far that are linear in the size 
of the input. Since they take linear time, the representation that they produce 
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as output is not only produced quickly, but it is of size comparable to the input 
size. These conversions are: 


1. Converting a CFG to a PDA, by the algorithm of Theorem 6.13. 


2. Converting a PDA that accepts by final state to a PDA that accepts by 
empty stack, using the construction of Theorem 6.11. 


3. Converting a PDA that accepts by empty stack to a PDA that accepts 
by final state, using the construction of Theorem 6.9. 


On the other hand, the running time of the conversion from a PDA to a 
grammar (Theorem 6.14) is much more complex. First, note that n, the total 
length of the input, is surely an upper bound on the number of states and 
stack symbols, so there cannot be more than n? variables of the form [pXq] 
constructed for the grammar. However, the running time of the conversion can 
be exponential, if there is a transition of the PDA that puts a large number of 
symbols on the stack. Note that one rule could place almost n symbols on the 
stack. 

If we review the construction of grammar productions from a rule like 
“§(q,a,X) contains (ro, Yi Y2---Yx),” we note that it gives rise to a collec- 
tion of productions of the form [¢X rz] > [roYiril[ri Yara] ++: [re—-1Yere] for all 
lists of states 71,7r2,...,7,. As k could be close to n, and there could be close 
to n states, the total number of productions grows as n”. We cannot carry out 
such a construction for reasonably sized PDA’s if the PDA has even one long 
stack string to write. 

Fortunately, this worst case never has to occur. As was suggested by Ex- 
ercise 6.2.8, we can break the pushing of a long string of stack symbols into a 
sequence of at most n steps that each pushes one symbol. That is, if 6(g,a, X) 
contains (ro, Yı Y2 +- Yk), we may introduce new states po, p3,...,Pe—1- Then, 
we replace (ro, ¥i1Y2-:-Y,) in 6(q,a,X) by (pe—1, Ye-1Y,), and introduce the 
new transitions 


Ô(Pk—1, €, Ye-1) = {(Pe—2, Yk—-2Yk—1)}, (Pk-2, €, Yk—2) = {(De—3, Yk-3Yk—2)} 


and so on, down to ô(p2,€, Y2) = {(r0, Y1 Yo). 

Now, no transition has more than two stack symbols. We have added at 
most n new states, and the total length of all the transition rules of ô has grown 
by at most a constant factor; i.e., it is still O(n). There are O(n) transition 
rules, and each generates O(n?) productions, since there are only two states 
that need to be chosen in the productions that come from each rule. Thus, the 
constructed grammar has length O(n?) and can be constructed in cubic time. 
We summarize this informal analysis in the theorem below. 


Theorem 7.31: There is an O(n?) algorithm that takes a PDA P whose 
representation has length n and produces a CFG of length at most O(n?). This 
CFG generates the same language as P accepts by empty stack. Optionally, we 
can cause G to generate the language that P accepts by final state. 
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7.4.2 Running Time of Conversion to Chomsky Normal 
Form 


As decision algorithms may depend on first putting a CFG into Chomsky Nor- 
mal Form, we should also look at the running time of the various algorithms 
that we used to convert an arbitrary grammar to a CNF grammar. Most of the 
steps preserve, up to a constant factor, the length of the grammar’s description; 
that is, starting with a grammar of length n they produce another grammar of 
length O(n). The good news is summarized in the following list of observations: 


1. Using the proper algorithm (see Section 7.4.3), detecting the reachable 
and generating symbols of a grammar can be done in O(n) time. Elimi- 
nating the resulting useless symbols takes O(n) time and does not increase 
the size of the grammar. 


2. Constructing the unit pairs and eliminating unit productions, as in Sec- 
tion 7.1.4, takes O(n”) time and the resulting grammar has length O(n’). 


3. The replacement of terminals by variables in production bodies, as in 
Section 7.1.5 (Chomsky Normal Form), takes O(n) time and results in a 
grammar whose length is O(n). 


4. The breaking of production bodies of length 3 or more into bodies of 
length 2, as carried out in Section 7.1.5 also takes O(n) time and results 
in a grammar of length O(n). 


The bad news concerns the construction of Section 7.1.3, where we eliminate 
e-productions. If we have a production body of length k, we could construct 
from that one production 2* — 1 productions for the new grammar. Since k 
could be proportional to n, this part of the construction could take O(2") time 
and result in a grammar whose length is O(2”). 

To avoid this exponential blowup, we need only to bound the length of 
production bodies. The trick of Section 7.1.5 can be applied to any production 
body, not just to one without terminals. Thus, we recommend, as a preliminary 
step before eliminating e-productions, the breaking of all long production bodies 
into a sequence of productions with bodies of length 2. This step takes O(n) 
time and grows the grammar only linearly. The construction of Section 7.1.3, 
to eliminate e-productions, will work on bodies of length at most 2 in such a 
way that the running time is O(n) and the resulting grammar has length O(n). 

With this modification to the overall CNF construction, the only step that 
is not linear is the elimination of unit productions. As that step is O(n”), we 
conclude the following: 


Theorem 7.32: Given a grammar G of length n, we can find an equivalent 
Chomsky-Normal-Form grammar for G in time O(n); the resulting grammar 
has length O(n”). 
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7.4.3 Testing Emptiness of CFL’s 


We have already seen the algorithm for testing whether a CFL L is empty. 
Given a grammar G for the language L, use the algorithm of Section 7.1.2 to 
decide whether the start symbol S of G is generating, i.e., whether S derives 
at least one string. L is empty if and only if S is not generating. 

Because of the importance of this test, we shall consider in detail how much 
time it takes to find all the generating symbols of a grammar G. Suppose 
the length of G is n. Then there could be on the order of n variables, and 
each pass of the inductive discovery of generating variables could take O(n) 
time to examine all the productions of G. If only one new generating variable 
is discovered on each pass, then there could be O(n) passes. Thus, a naive 
implementation of the generating-symbols test is O(n”). 

However, there is a more careful algorithm that sets up a data structure in 
advance to make our discovery of generating symbols take O(n) time only. The 
data structure, suggested in Fig. 7.11, starts with an array indexed by the vari- 
ables, as shown on the left, which tells whether or not we have established that 
the variable is generating. In Fig. 7.11, the array suggests that we have discov- 
ered B is generating, but we do not know whether or not A is generating. At the 
end of the algorithm, each question mark will become “no,” since any variable 
not discovered by the algorithm to be generating is in fact nongenerating. 


Generating? 
A| ? a es 
B | yes 


Figure 7.11: Data structure for the linear-time emptiness test 


The productions are preprocessed by setting up several kinds of useful links. 
First, for each variable there is a chain of all the positions in which that vari- 
able appears. For instance, the chain for variable B is suggested by the solid 
lines. For each production, there is a count of the number of positions holding 
variables whose ability to generate a terminal string has not yet been taken into 
account. The dashed lines suggest links from the productions to their counts. 
The counts shown in Fig. 7.11 suggest that we have not yet taken any of the 
variables into account, even though we just established that B is generating. 

Suppose that we have discovered that B is generating. We go down the list 
of positions of the bodies holding B. For each such position, we decrement the 
count for that production by 1; there is now one fewer position we need to find 
generating in order to conclude that the variable at the head is also generating. 
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Other Uses for the Linear Emptiness Test 


The same data structure and accounting trick that we used in Section 7.4.3 
to test whether a variable is generating can be used to make some of the 
other tests of Section 7.1 linear-time. Two important examples are: 


1. Which symbols are reachable? 


2. Which symbols are nullable? 


If a count reaches 0, then we know the head variable is generating. A 
link, suggested by the dotted lines, gets us to the variable, and we may put 
that variable on a queue of generating variables whose consequences need to be 
explored (as we just did for variable B). This queue is not shown. 

We must argue that this algorithm takes O(n) time. The important points 
are as follows: 


e Since there are at most n variables in a grammar of size n, creation and 
initialization of the array takes O(n) time. 


e There are at most n productions, and their total length is at most n, so 
initialization of the links and counts suggested in Fig. 7.11 can be done 
in O(n) time. 


e When we discover a production has count 0 (i.e., all positions of its body 
are generating), the work involved can be put into two categories: 


1. Work done for that production: discovering the count is 0, finding 
which variable, say A, is at the head, checking whether it is already 
known to be generating, and putting it on the queue if not. All these 
steps are O(1) for each production, and so at most O(n) work of this 
type is done in total. 

2. Work done when visiting the positions of the production bodies that 
have the head variable A. This work is proportional to the number 
of positions with A. Therefore, the aggregate amount of work done 
processing all generating symbols is proportional to the sum of the 
lengths of the production bodies, and that is O(n). 


We conclude that the total work done by this algorithm is O(n). 


7.4.4 Testing Membership in a CFL 


We can also decide membership of a string w in a CFL L. There are several 
inefficient ways to make the test; they take time that is exponential in |w], 
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assuming a grammar or PDA for the language L is given and its size is treated 
as a constant, independent of w. For instance, start by converting whatever 
representation of L we are given into a CNF grammar for L. As the parse trees 
of a Chomsky-Normal-Form grammar are binary trees, if w is of length n then 
there will be exactly 2n — 1 nodes labeled by variables in the tree (that result 
has an easy, inductive proof, which we leave to you). The number of possible 
trees and node-labelings is thus “only” exponential in n, so in principle we can 
list them all and check to see if any of them yields w. 

There is a much more efficient technique based on the idea of “dynamic 
programming,” which may also be known to you as a “table-filling algorithm” 
or “tabulation.” This algorithm, known as the CYK Algorithm,’ starts with a 
CNF grammar G = (V,T, P, S) for a language L. The input to the algorithm is 
a string w = a1@9++:dp in T*. In O(n?) time, the algorithm constructs a table 
that tells whether w is in L. Note that when computing this running time, 
the grammar itself is considered fixed, and its size contributes only a constant 
factor to the running time, which is measured in terms of the length of the 
string w whose membership in L is being tested. 

In the CYK algorithm, we construct a triangular table, as suggested in 
Fig. 7.12. The horizontal axis corresponds to the positions of the string w = 
a1a2':'an, Which we have supposed has length 5. The table entry X;; is the 
set of variables A such that A => aiaj41++:a;. Note in particular, that we are 
interested in whether S is in the set Xın, because that is the same as saying 
SŠ w,ie., wisin L. 


a e o a 
Figure 7.12: The table constructed by the CYK algorithm 


To fill the table, we work row-by-row, upwards. Notice that each row cor- 
responds to one length of substrings; the bottom row is for strings of length 1, 
the second-from-bottom row for strings of length 2, and so on, until the top row 
corresponds to the one substring of length n, which is w itself. It takes O(n) 
time to compute any one entry of the table, by a method we shall discuss next. 


31t is named after three people, each of whom independently discovered essentially the 
same idea: J. Cocke, D. Younger, and T. Kasami. 
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Since there are n(n + 1)/2 table entries, the whole table-construction process 
takes O(n?) time. Here is the algorithm for computing the X;;’s: 


BASIS: We compute the first row as follows. Since the string beginning and 
ending at position i is just the terminal a;, and the grammar is in CNF, the 
only way to derive the string a; is to use a production of the form A —> aj. 
Thus, Xj; is the set of variables A such that A — a; is a production of G. 


INDUCTION: Suppose we want to compute X;;, which is in row j — i+ 1, and 
we have computed all the X’s in the rows below. That is, we know about all 
strings shorter than aja;41---a;, and in particular we know about all proper 
prefixes and proper suffixes of that string. As 7 — i > 0 may be assumed (since 
the case i = j is the basis), we know that any derivation A + QjQj41°** @; must 
start out with some step A = BC. Then, B derives some prefix of ajaj41 +++ aj, 
say B 5 ajaj41°+: ap, for some k < j. Also, C must then derive the remainder 
of Qjaj41°°* aj, that is, G fan Ak+1đ0k+2''' Qj. 

We conclude that in order for A to be in X;;, we must find variables B and 
C, and integer k such that: 


l.i<k<j. 

2. B isin Xigk. 

3. C is in Xk41,5- 

4. A BC is a production of G. 


Finding such variables A requires us to compare at most n pairs of previously 
computed sets: (Xi, Xi41,;), (Xi ini, Xi+2,5), and so on, until (Xi,j-1, X5;)- 
The pattern, in which we go up the column below X;; at the same time we go 
down the diagonal, is suggested by Fig. 7.13. 


Figure 7.13: Computation of X;; requires matching the column below with the 
diagonal to the right 


Theorem 7.33: The algorithm described above correctly computes X;; for all 
i and j; thus w is in L(G) if and only if S is in X1,. Moreover, the running 
time of the algorithm is O(n’). 
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PROOF: The reason the algorithm finds the correct sets of variables was ex- 
plained as we introduced the basis and inductive parts of the algorithm. For the 
running time, note that there are O(n) entries to compute, and each involves 
comparing and computing with n pairs of entries. It is important to remember 
that, although there can be many variables in each set X;j, the grammar G is 
fixed and the number of its variables does not depend on n, the length of the 
string w whose membership is being tested. Thus, the time to compare two 
entries Xj, and X,41,;, and find variables to go into X;; is O(1). As there are 
at most n such pairs for each X;;, the total work is O(n*). 


Example 7.34: The following are the productions of a CNF grammar G: 


S > AB|BC 
A > BA|la 
B > CC |b 
C > ABla 


We shall test for membership in L(G) the string baaba. Figure 7.14 shows the 
table filled in for this string. 
{$4,0 
- {$40 
= 8B {B} 
{SA} {B} {$0 {84} 


{B} {AG {4G {B} {AG 


b a a b a 


Figure 7.14: The table for string baaba constructed by the CYK algorithm 


To construct the first (lowest) row, we use the basis rule. We have only to 
consider which variables have a production body a (those variables are A and 
C) and which variables have body b (only B does). Thus, above those positions 
holding a we see the entry {A,C}, and above the positions holding b we see 
{B}. That is, X41 = X44 = {B}, and Xo = X33 = X55 = {A,C}. 

In the second row we see the values of X12, X23, X34, and X45. For instance, 
let us see how X12 is computed. There is only one way to break the string from 
positions 1 to 2, which is ba, into two nonempty substrings. The first must be 
position 1 and the second must be position 2. In order for a variable to generate 
ba, it must have a body whose first variable is in X11 = {B} (ie., it generates 
the b) and whose second variable is in X22 = {A,C} (ie., it generates the a). 
This body can only be BA or BC. If we inspect the grammar, we find that the 
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productions A > BA and S —> BC are the only ones with these bodies. Thus, 
the two heads, A and S, constitute X12. 

For a more complex example, consider the computation of X2,. We can 
break the string aab that occupies positions 2 through 4 by ending the first 
string after position 2 or position 3. That is, we may choose k = 2 or k = 3 in 
the definition of X24. Thus, we must consider all bodies in X92X%34 U X03 X44. 
This set of strings is {A,C}{S,C} U {BHB} = {AS, AC, CS, CC, BB}. Of the 
five strings in this set, only CC is a body, and its head is B. Thus, X24 = {B}. 


7.4.5 Preview of Undecidable CFL Problems 


In the next chapters we shall develop a remarkable theory that lets us prove 
formally that there are problems we cannot solve by any algorithm that can 
run on a computer. We shall use it to show that a number of simple-to-state 
questions about grammars and CFL’s have no algorithm; they are called “un- 
decidable problems.” For now, we shall have to content ourselves with a list 
of the most significant undecidable questions about context-free grammars and 
languages. The following are undecidable: 


1. Is a given CFG G ambiguous? 

2. Is a given CFL inherently ambiguous? 

3. Is the intersection of two CFL’s empty? 

4. Are two CFL’s the same? 

5. Is a given CFL equal to ©*, where E is the alphabet of this language? 


Notice that the flavor of question (1), about ambiguity, is somewhat different 
from the others, in that it is a question about a grammar, not a language. All 
the other questions assume that the language is represented by a grammar or 
PDA, but the question is about the language(s) defined by the grammar or 
PDA. For instance, in contrast to question (1), the second question asks, given 
a grammar G (or a PDA, for that matter), does there exist some equivalent 
grammar G” that is unambiguous. If G is itself unambiguous, then the answer 
is surely “yes,” but if G is ambiguous, there could still be some other grammar 
G' for the same language that is unambiguous, as we learned about expression 
grammars in Example 5.27. 


7.4.6 Exercises for Section 7.4 
Exercise 7.4.1: Give algorithms to decide the following: 
* a) Is L(G) finite, for a given CFG G? Hint: Use the pumping lemma. 


! b) Does L(G) contain at least 100 strings, for a given CFG G? 
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!! c) Given a CFG G and one of its variables A, is there any sentential form 
in which A is the first symbol. Note: Remember that it is possible for A 
to appear first in the middle of some sentential form but then for all the 
symbols to its left to derive e. 


Exercise 7.4.2: Use the technique described in Section 7.4.3 to develop linear- 
time algorithms for the following questions about CFG’s: 


a) Which symbols appear in some sentential form? 


b) Which symbols are nullable (derive €)? 


Exercise 7.4.3: Using the grammar G of Example 7.34, use the CYK algo- 
rithm to determine whether each of the following strings is in L(G): 


* a) ababa. 
b) baaab. 


c) aabab. 


* Exercise 7.4.4: Show that in any CNF grammar, all parse trees for strings of 
length n have 2n — 1 interior nodes (i.e., 2n — 1 nodes with variables for labels). 


! Exercise 7.4.5: Modify the CYK algorithm to report the number of distinct 
parse trees for the given input, rather than just reporting membership in the 
language. 


7.5 Summary of Chapter 7 


+ Eliminating Useless Symbols: A variable can be eliminated from a CFG 
unless it derives some string of terminals and also appears in at least 
one string derived from the start symbol. To correctly eliminate such 
useless symbols, we must first test whether a variable derives a terminal 
string, and eliminate those that do not, along with all their productions. 
Only then do we eliminate variables that are not derivable from the start 
symbol. 


+ Eliminating c- and Unit-productions: Given a CFG, we can find another 
CFG that generates the same language, except for string €, yet has no é- 
productions (those with body €) or unit productions (those with a single 
variable as the body). 


+ Chomsky Normal Form: Given a CFG that derives at least one nonempty 
string, we can find another CFG that generates the same language, except 
for €, and is in Chomsky Normal Form: there are no useless symbols, and 
every production body consists of either two variables or one terminal. 
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+ The Pumping Lemma: In any CFL, it is possible to find, in any sufficiently 
long string of the language, a short substring such that the two ends of 
that substring can be “pumped” in tandem; i.e., each can be repeated any 
desired number of times. The strings being pumped are not both e. This 
lemma, and a more powerful version called Ogden’s lemma mentioned in 
Exercise 7.2.3, allow us to prove many languages not to be context-free. 


+ Operations That Preserve Context-Free Languages: The CFL’s are closed 
under substitution, union, concatenation, closure (star), reversal, and in- 
verse homomorphisms. CFL’s are not closed under intersection or com- 
plementation, but the intersection of a CFL and a regular language is 
always a CFL. 


+ Testing Emptiness of a CFL: Given a CFG, there is an algorithm to tell 
whether it generates any strings at all. A careful implementation allows 
this test to be conducted in time that is proportional to the size of the 
grammar itself. 


+ Testing Membership in a CFL: The Cocke-Younger-Kasami algorithm 
tells whether a given string is in a given context-free language. For a 
fixed CFL, this test takes time O(n*), if n is the length of the string 
being tested. 


7.6 Gradiance Problems for Chapter 7 


The following is a sample of problems that are available on-line through the 
Gradiance system at www.gradiance.com/pearson. Each of these problems 
is worked like conventional homework. The Gradiance system gives you four 
choices that sample your knowledge of the solution. If you make the wrong 
choice, you are given a hint or advice and encouraged to try the same problem 
again. 


Problem 7.1: The operation Perm(w), applied to a string w, is all strings 
that can be constructed by permuting the symbols of w in any order. For 
example, if w = 101, then Perm(w) is all strings with two 1’s and one 0, i.e., 
Perm(w) = {101,110,011}. If L is a regular language, then Perm(ZL) is the 
union of Perm(w) taken over all w in L. For example, if L is the language 
L(0*1*), then Perm(ZL) is all strings of 0’s and 1’s, i.e., L((0 + 1)*). If L is 
regular, Perm(L) is sometimes regular, sometimes context-free but not regular, 
and sometimes not even context-free. Consider each of the following regular 
expressions R below, and decide whether Perm(L(R)) is regular, context-free, 
or neither: 


1. (01) 


2. 0* + 1* 
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3. (012)* 
4. (01+ 2)* 


Problem 7.2: The language L = {ss | s is a string of a’s and b’s} is not a 
context-free language. In order to prove that L is not context-free we need to 
show that for every integer n, there is some string z in L, of length at least n, 
such that no matter how we break z up as z = uvway, subject to the constraints 
juvw| < n and |uw| > 0, there is some i > 0 such that uv'wa'y is not in L. 

Let us focus on a particular z = aabaaaba and n = 7. It turns out that this 
is the wrong choice of z for n = 7, since there are some ways to break z up for 
which we can find the desired i, and for others, we cannot. Identify from the 
list below the choice of u,v, w, 2, y for which there is an i that makes uv’wa’y 
not be in L. We show the breakup of aabaaaba by placing four |’s among the 
a’s and b’s. The resulting five pieces (some of which may be empty), are the 
five strings. For instance, aa|b||aaaba| means u = aa, v = b, w = €, x = aaaba, 
and y = €. 


Problem 7.3: Apply the CYK algorithm to the input ababaa and the gram- 
mar: 


S— AB | BC 
A> BA|a 
BoCC |b 
C+ AB|a 


Compute the table of entries X;; = the set of nonterminals that derive positions 
i through j, inclusive, of the string ababaa. Then, identify a true assertion about 
one of the X;;’s in the list below. 


Problem 7.4: For the grammar: 


S > AB | CD 
A> BC |a 
B>AC|C 
C > AB|CD 
D-> AC \|d 


1. Find the generating symbols. Recall, a grammar symbol is generating if 
there is a deriviation of at least one terminal string, starting with that 
symbol. 


2. Eliminate all useless productions — those that contain at least one symbol 
that is not a generating symbol. 


3. In the resulting grammar, eliminate all symbols that are not reachable — 
they appear in no string derived from S. 
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In the list below, you will find several statements about which symbols are 
generating, which are reachable, and which productions are useless. Select the 
one that is false. 


Problem 7.5: In Fig. 7.15 is a context-free grammar. Find all the nullable 
symbols (those that derive e in one or more steps). Then, identify the true 
statement from the list below. 


S > AB | CD 
A> BG|0 
Bo AD|e 
C>CD|1 
D> BB\E 
E — AF | B1 
F > EG | 0C 
G —> AG | BD 


Figure 7.15: A context-free grammar 


Problem 7.6: For the CFG of Fig. 7.15, find all the nullable symbols, and then 
use the construction from Section 7.1.3 to modify the grammar’s productions so 
there are no e-productions. The language of the grammar should change only 
in that e will no longer be in the language. 


Problem 7.7: A unit pair (X,Y) for a context-free grammar is a pair where: 
1. X and Y are variables (nonterminals) of the grammar. 


2. There is a derivation X =>* Y that uses only unit productions (produc- 
tions with a body that consists of exactly one occurrence of some variable, 
and nothing else). 


For the grammar of Fig. 7.16, identify all the unit pairs. Then, select from the 
list below the pair that is not a unit pair. 


Problem 7.8: Convert the grammar of Fig. 7.16 to an equivalent grammar 
with no unit productions, using the construction of Section 7.1.4. Then, choose 
one of the productions of the new grammar from the list below. 


Problem 7.9: Suppose we execute the Chomsky-normal-form conversion al- 
gorithm of Section 7.1.5. Let A + BCODE be one of the productions of the 
given grammar, which has already been freed of e-productions and unit pro- 
ductions. Suppose that in our construction, we introduce new variable Xa to 
derive a terminal a, and when we need to split the right side of a production, we 
use new variables Y1, Y2,.... What productions would replace A + BCODE? 
Identify one of these replacing productions from the list below. 
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S>A|B|2 
A+C0|D 
BoC1|E 
Co D|\E|3 
D- E0|S 
E>D1|S 


Figure 7.16: Another context-free grammar 


Problem 7.10: G is a context-free grammar with start symbol 5), and no 
other nonterminals whose name begins with “S.” Similarly, G2 is a context-free 
grammar with start symbol S2 and no other nonterminals whose name begins 
with “S,” Sı and Sj appear on the right side of no productions. Also, no 
nonterminal appears in both G; and G2. We wish to combine the symbols and 
productions of G, and G2 to form a new grammar G, whose language is the 
union of the languages of Gı and Gə. The start symbol of G will be S. All 
productions and symbols of G; and Gə will be symbols and productions of G. 
Which of the following sets of productions, added to those of G, is guaranteed 
to make L(G) be L(G1) U L(G2)? 


Problem 7.11: Under the same assumptions as Problem 7.10, which of the 
following sets of productions is guaranteed to make L(G) be L(G1)L(G2)? 


Problem 7.12: A linear grammar is a context-free grammar in which no pro- 
duction body has more than one occurrence of one variable. For example, 
A => 0B1 or A > 001 could be productions of a linear grammar, but A > BB 
or A —> AOB could not. A linear language is a language that has at least one 
linear grammar. 

The following statement is false: “’The concatenation of two linear lan- 
guages is a linear language.” To prove it we use a counterexample: We give two 
linear languages Lı and Lə and show that their concatenation is not a linear 
language. Which of the following can serve as a counterexample? 


Problem 7.13: The intersection of two CFL’s need not be a CFL. Identify in 
the list below a pair of CFL’s such that their intersection is not a CFL. 


Problem 7.14: Here is a grammar, whose variables and terminals are not 
named using the usual convention. Any of R through Z could be either a 
variable or terminal; it is your job to figure out which is which, and which 
could be the start symbol. 


R > ST | UV 
T—>UV |W 
V>XYJ|Z 


X>YZ|T 
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We do have an important clue: There are no useless productions in this gram- 
mar; that is, each production is used in some derivation of some terminal string 
from the start symbol. Your job is to figure out which letters definitely represent 
variables, which definitely represent terminals, which could represent either a 
terminal or a nonterminal, and which could be the start symbol. Remember 
that the usual convention, which might imply that all these letters stand for 
either terminals or variables, does not apply here. 


Problem 7.15: Five languages are defined by the following five grammars: 
Lı S —> aasa |€ 
La S + aSaa | a 
Ls S — aa, A > as |e 
Lı S —> Saaa | aa |€ 
L; S —>aaA|a|e, A—>asS 
Determine: 
1. Which pairs of languages are disjoint? 
2. Which languages are contained in which other languages? 


3. Which languages are complements of one another (with respect to the 
language a*)? 


Then, identify the statement below that is false. 


Problem 7.16: Let L be the language of the grammar: 


S — AB 
A->aAb|aA|e 
B > bBa |c 


The operation min(L) returns those strings in L such that no proper prefix is 
in L. Describe the language min(L) and identify in the list below the one string 
that is in min(L). 


Problem 7.17: Let L be the language of the grammar: 


S— AB 
A->aAb|aA|e 
B-bBa|c 


The operation maz(L) returns those strings in L that are not a proper prefix 
of any other string in L. Describe the language max(L) and identify in the list 
below the one string that is in maz(L). 
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7.7 References for Chapter 7 


Chomsky Normal Form comes from [2]. Greibach Normal Form is from [4], 
although the construction outlined in Exercise 7.1.11 is due to M. C. Paull. 

Many of the fundamental properties of context-free languages come from [1]. 
These ideas include the pumping lemma, basic closure properties, and tests for 
simple questions such as emptiness and finiteness of a CFL. In addition [6] is 
the source for the nonclosure under intersection and complementation, and [3] 
provides additional closure results, including closure of the CFL’s under inverse 
homomorphism. Ogden’s lemma comes from [5]. 

The CYK algorithm has three known independent sources. J. Cocke’s work 
was circulated privately and never published. T. Kasami’s rendition of essen- 
tially the same algorithm appeared only in an internal US-Air-Force memoran- 
dum. However, the work of D. Younger was published conventionally [7]. 
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Chapter 8 


Introduction to Turing 
Machines 


In this chapter we change our direction significantly. Until now, we have been 
interested primarily in simple classes of languages and the ways that they can 
be used for relatively constrained problems, such as analyzing protocols, search- 
ing text, or parsing programs. Now, we shall start looking at the question of 
what languages can be defined by any computational device whatsoever. This 
question is tantamount to the question of what computers can do, since recog- 
nizing the strings in a language is a formal way of expressing any problem, and 
solving a problem is a reasonable surrogate for what it is that computers do. 

We begin with an informal argument, using an assumed knowledge of C 
programming, to show that there are specific problems we cannot solve using 
a computer. These problems are called “undecidable.” We then introduce a 
venerable formalism for computers, called the Turing machine. While a Turing 
machine looks nothing like a PC, and would be grossly inefficient should some 
startup company decide to manufacture and sell them, the Turing machine long 
has been recognized as an accurate model for what any physical computing 
device is capable of doing. 

In Chapter 9, we use the Turing machine to develop a theory of “undecid- 
able” problems, that is, problems that no computer can solve. We show that 
a number of problems that are easy to express are in fact undecidable. An ex- 
ample is telling whether a given grammar is ambiguous, and we shall see many 
others. 


8.1 Problems That Computers Cannot Solve 
The purpose of this section is to provide an informal, C-programming-based 
introduction to the proof of a specific problem that computers cannot solve. 


The particular problem we discuss is whether the first thing a C program prints 
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is hello, world. Although we might imagine that simulation of the program 
would allow us to tell what the program does, we must in reality contend with 
programs that take an unimaginably long time before making any output at 
all. This problem — not knowing when, if ever, something will occur — is the 
ultimate cause of our inability to tell what a program does. However, proving 
formally that there is no program to do a stated task is quite tricky, and we 
need to develop some formal mechanics. In this section, we give the intuition 
behind the formal proofs. 


8.1.1 Programs that Print “Hello, World” 


In Fig. 8.1 is the first C program met by students who read Kernighan and 
Ritchie’s classic book.! It is rather easy to discover that this program prints 
hello, world and terminates. This program is so transparent that it has 
become a common practice to introduce languages by showing how to write a 
program to print hello, world in those languages. 


main() 
{ 

printf("hello, world\n") ; 
} 


Figure 8.1: Kernighan and Ritchie’s hello-world program 


However, there are other programs that also print hello, world; yet the 
fact that they do so is far from obvious. Figure 8.2 shows another program that 
might print hello, world. It takes an input n, and looks for positive integer 
solutions to the equation z” +y” = z”. If it finds one, it prints hello, world. 
If it never finds integers x, y, and z to satisfy the equation, then it continues 
searching forever, and never prints hello, world. 

To understand what this program does, first observe that exp is an auxiliary 
function to compute exponentials. The main program needs to search through 
triples (x,y, z) in an order such that we are sure we get to every triple of positive 
integers eventually. To organize the search properly, we use a fourth variable, 
total, that starts at 3 and, in the while-loop, is increased one unit at a time, 
eventually reaching any finite integer. Inside the while-loop, we divide total 
into three positive integers x, y, and z, by first allowing x to range from 1 to 
total-2, and within that for-loop allowing y to range from 1 up to one less 
than what x has not already taken from total. What remains, which must be 
between 1 and total-2, is given to z. 

In the innermost loop, the triple (x,y,z) is tested to see if r” +y” = z”. If 
so, the program prints hello, world, and if not, it prints nothing. 


1B. W. Kernighan and D. M. Ritchie, The C Programming Language, 1978, Prentice-Hall, 
Englewood Cliffs, NJ. 
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int exp(int i, n) 
/* computes i to the power n */ 


{ 
int ans, j; 
ans = 1; 
for (j=1; j<=n; j++) ans *= i; 
return(ans) ; 
} 
main () 
{ 
int n, total, x, y, Z; 
scanf("%d", &n); 
total = 3; 
while (1) { 
for (x=1; x<=total-2; x++) 
for (y=1; y<=total-x-1; y++) { 
z = total - x - y; 
if (exp(x,n) + exp(y,n) == exp(z,n)) 
printf("hello, world\n") ; 
} 
totaltt; 
} 
} 


Figure 8.2: Fermat’s last theorem expressed as a hello-world program 


If the value of n that the program reads is 2, then it will eventually find 
combinations of integers such as total = 12, x = 3, y = 4, and z = 5, for which 
x” +y” = z”. Thus, for input 2, the program does print hello, world. 

However, for any integer n > 2, the program will never find a triple of 
positive integers to satisfy £z” + y” = z”, and thus will fail to print hello, 
world. Interestingly, until a few years ago, it was not known whether or not this 
program would print hello, world for some large integer n. The claim that it 
would not, i.e., that there are no integer solutions to the equation x” +y” = z” 
ifn > 2, was made by Fermat 300 years ago, but no proof was found until quite 
recently. This statement is often referred to as “Fermat’s last theorem.” 

Let us define the hello-world problem to be: determine whether a given C 
program, with a given input, prints hello, world as the first 12 characters 
that it prints. In what follows, we often use, as a shorthand, the statement 
about a program that it prints hello, world to mean that it prints hello, 
world as the first 12 characters that it prints. 

It seems likely that, ifit takes mathematicians 300 years to resolve a question 
about a single, 22-line program, then the general problem of telling whether a 


318 CHAPTER 8. INTRODUCTION TO TURING MACHINES 


Why Undecidable Problems Must Exist 


While it is tricky to prove that a specific problem, such as the “hello- 
world problem” discussed here, must be undecidable, it is quite easy to 
see why almost all problems must be undecidable by any system that 
involves programming. Recall that a “problem” is really membership of a 
string in a language. The number of different languages over any alphabet 
of more than one symbol is not countable. That is, there is no way to 
assign integers to the languages such that every language has an integer, 
and every integer is assigned to one language. 

On the other hand programs, being finite strings over a finite alphabet 
(typically a subset of the ASCII alphabet), are countable. That is, we can 
order them by length, and for programs of the same length, order them 
lexicographically. Thus, we can speak of the first program, the second 
program, and in general, the ith program for any integer i. 

As aresult, we know there are infinitely fewer programs than there are 
problems. If we picked a language at random, almost certainly it would be 
an undecidable problem. The only reason that most problems appear to be 
decidable is that we rarely are interested in random problems. Rather, we 
tend to look at fairly simple, well-structured problems, and indeed these 
are often decidable. However, even among the problems we are interested 
in and can state clearly and succinctly, we find many that are undecidable; 
the hello-world problem is a case in point. 


given program, on a given input, prints hello, world must be hard indeed. 
In fact, any of the problems that mathematicians have not yet been able to 
resolve can be turned into a question of the form “does this program, with this 
input, print hello, world?” Thus, it would be remarkable indeed if we could 
write a program that could examine any program P and input J for P, and tell 
whether P, run with J as its input, would print hello, world. We shall prove 
that no such program exists. 


8.1.2 The Hypothetical “Hello, World” Tester 


The proof of impossibility of making the hello-world test is a proof by contra- 
diction. That is, we assume there is a program, call it H, that takes as input 
a program P and an input J, and tells whether P with input J prints hello, 
world. Figure 8.3 is a representation of what H does. In particular, the only 
output H makes is either to print the three characters yes or to print the two 
characters no. It always does one or the other. 

If a problem has an algorithm like H, that always tells correctly whether an 
instance of the problem has answer “yes” or “no,” then the problem is said to 
be “decidable.” Otherwise, the problem is “undecidable.” Our goal is to prove 
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I Hello—world 
> tester KL yes 
P H no 


Figure 8.3: A hypothetical program H that is a hello-world detector 


that H doesn’t exist; i.e., the hello-world problem is undecidable. 

In order to prove that statement by contradiction, we are going to make 
several changes to H, eventually constructing a related program called Hə that 
we show does not exist. Since the changes to H are simple transformations that 
can be done to any C program, the only questionable statement is the existence 
of H, so it is that assumption we have contradicted. 

To simplify our discussion, we shall make a few assumptions about C pro- 
grams. These assumptions make H’s job easier, not harder, so if we can show 
a “hello-world tester” for these restricted programs does not exist, then surely 
there is no such tester that could work for a broader class of programs. Our 
assumptions are: 


1. All output is character-based, e.g., we are not using a graphics package 
or any other facility to make output that is not in the form of characters. 


2. All character-based output is performed using printf, rather than put- 
char () or another character-based output function. 


We now assume that the program H exists. Our first modification is to 
change the output no, which is the response that H makes when its input 
program P does not print hello, world as its first output in response to input 
I. As soon as H prints “n,” we know it will eventually follow with the “o.”? 
Thus, we can modify any printf statement in H that prints “n” to instead 
print hello, world. Another printf statement that prints an “o” but not 
the “n” is omitted. As a result, the new program, which we call Hı, behaves 
like H, except it prints hello, world exactly when H would print no. Hı is 
suggested by Fig. 8.4. 

Our next transformation on the program is a bit trickier; it is essentially 
the insight that allowed Alan Turing to prove his undecidability result about 
Turing machines. Since we are really interested in programs that take other 
programs as input and tell something about them, we shall restrict H; so it: 


a) Takes only input P, not P and I. 


b) Asks what P would do if its input were its own code, i.e., what would Hı 
do on inputs P as program and P as input J as well? 


?Most likely, the program would put no in one printf, but it could print the “n” in one 
printf and the “o” in another. 
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I yes 
Hy KL 
P hello, world 


Figure 8.4: Hı behaves like H, but it says hello, world instead of no 


The modifications we must perform on Hı to produce the program Hə sug- 
gested in Fig. 8.5 are as follows: 


1. Hə first reads the entire input P and stores it in an array A, which it 
“malloc’s” for the purpose.’ 


2. Hə then simulates Hı, but whenever Hı would read input from P or J, 
H> reads from the stored copy in A. To keep track of how much of P and 
I Hı has read, Hə can maintain two cursors that mark positions in A. 


yes 
w 


hello, world 


Figure 8.5: Hə behaves like Hı, but uses its input P as both P and I 


We are now ready to prove Hə cannot exist. Thus, Hı does not exist, and 
likewise, H does not exist. The heart of the argument is to envision what Hə 
does when given itself as input. This situation is suggested in Fig. 8.6. Recall 
that Hə, given any program P as input, makes output yes if P prints hello, 
world when given itself as input. Also, Hə prints hello, world if P, given 
itself as input, does not print hello, world as its first output. 

Suppose that the Hə represented by the box in Fig. 8.6 makes the output 
yes. Then the Hə in the box is saying about its input Hə that Hə, given itself 
as input, prints hello, world as its first output. But we just supposed that 
the first output Hə makes in this situation is yes rather than hello, world. 

Thus, it appears that in Fig. 8.6 the output of the box is hello, world, 
since it must be one or the other. But if Hə, given itself as input, prints hello, 
world first, then the output of the box in Fig. 8.6 must be yes. Whichever 
output we suppose Hə makes, we can argue that it makes the other output. 


3The UNIX malloc system function allocates a block of memory of a size specified in 
the call to malloc. This function is used when the amount of storage needed cannot be 
determined until the program is run, as would be the case if an input of arbitrary length were 
read. Typically, malloc would be called several times, as more and more input is read and 
progressively more space is needed. 


8.1. PROBLEMS THAT COMPUTERS CANNOT SOLVE 321 


yes 
H—| nm < 


hello, world 


Figure 8.6: What does Hə do when given itself as input? 


This situation is paradoxical, and we conclude that Hə cannot exist. As a 
result, we have contradicted the assumption that H exists. That is, we have 
proved that no program H can tell whether or not a given program P with 
input J prints hello, world as its first output. 


8.1.3 Reducing One Problem to Another 


Now, we have one problem — does a given program with given input print 
hello, world as the first thing it prints? — that we know no computer program 
can solve. A problem that cannot be solved by computer is called undecidable. 
We shall give the formal definition of “undecidable” in Section 9.3, but for the 
moment, let us use the term informally. Suppose we want to determine whether 
or not some other problem is solvable by a computer. We can try to write a 
program to solve it, but if we cannot figure out how to do so, then we might 
try a proof that there is no such program. 

Perhaps we could prove this new problem undecidable by a technique similar 
to what we did for the hello-world problem: assume there is a program to solve 
it and develop a paradoxical program that must do two contradictory things, 
like the program Hə. However, once we have one problem that we know is 
undecidable, we no longer have to prove the existence of a paradoxical situation. 
It is sufficient to show that if we could solve the new problem, then we could use 
that solution to solve a problem we already know is undecidable. The strategy 
is suggested in Fig. 8.7; the technique is called the reduction of P, to Py». 


P —| Construct [> P, Decide yes 
instance instance 


no 


Figure 8.7: If we could solve problem P>, then we could use its solution to solve 
problem P; 


Suppose that we know problem P; is undecidable, and Pə is a new problem 
that we would like to prove is undecidable as well. We suppose that there is a 
program represented in Fig. 8.7 by the diamond labeled “decide”; this program 
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Can a Computer Really Do All That? 


If we examine a program such as Fig. 8.2, we might ask whether it really 
searches for counterexamples to Fermat’s last theorem. After all, integers 
are only 32 bits long in the typical computer, and if the smallest counterex- 
ample involved integers in the billions, there would be an overflow error 
before the solution was found. In fact, one could argue that a computer 
with 128 megabytes of main memory and a 30 gigabyte disk, has “only” 
25 630128000000 states, and is thus a finite automaton. 

However, treating computers as finite automata (or treating brains 
as finite automata, which is where the FA idea originated), is unproduc- 
tive. The number of states involved is so large, and the limits so unclear, 
that you don’t draw any useful conclusions. In fact, there is every reason 
to believe that, if we wanted to, we could expand the set of states of a 
computer arbitrarily. 

For instance, we can represent integers as linked lists of digits, of 
arbitrary length. If we run out of memory, the program can print a request 
for a human to dismount its disk, store it, and replace it by an empty disk. 
As time goes on, the computer could print requests to swap among as many 
disks as the computer needs. This program would be far more complex 
than that of Fig. 8.2, but not beyond our capabilities to write. Similar 
tricks would allow any other program to avoid finite limitations on the size 
of memory or on the size of integers or other data items. 


prints yes or no, depending on whether its input instance of problem Pə is or 
is not in the language of that problem.4 

In order to make a proof that problem P» is undecidable, we have to invent a 
construction, represented by the square box in Fig. 8.7, that converts instances 
of P, to instances of P> that have the same answer. That is, any string in the 
language Pı is converted to some string in the language P2, and any string over 
the alphabet of Pı that is not in the language P, is converted to a string that 
is not in the language P2. Once we have this construction, we can solve P, as 
follows: 


1. Given an instance of Pı, that is, given a string w that may or may not be 
in the language Pı, apply the construction algorithm to produce a string 
x. 


2. Test whether x is in P2, and give the same answer about w and P,. 


4Recall that a problem is really a language. When we talked of the problem of deciding 
whether a given program and input results in hello, world as the first output, we were really 
talking about strings consisting of a C source program followed by whatever input file(s) the 
program reads. This set of strings is a language over the alphabet of ASCII characters. 
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The Direction of a Reduction Is Important 


It is a common mistake to try to prove a problem P» undecidable by 
reducing P> to some known undecidable problem Pı; i.e., showing the 
statement “if Pı is decidable, then P> is decidable.” That statement, 
although surely true, is useless, since its hypothesis “P, is decidable” is 
false. 

The only way to prove a new problem P, to be undecidable is to 
reduce a known undecidable problem P, to Pə. That way, we prove the 
statement “if Pə is decidable, then Pı is decidable.” The contrapositive of 
that statement is “if P) is undecidable, then P> is undecidable.” Since we 
know that Pı undecidable, we can deduce that P> is undecidable. 


If w is in P,, then x is in P2, so this algorithm says yes. If w is not in Py, 
then x is not in Py, and the algorithm says no. Either way, it says the truth 
about w. Since we assumed that no algorithm to decide membership of a string 
in P, exists, we have a proof by contradiction that the hypothesized decision 
algorithm for P> does not exist; i.e., Pə is undecidable. 


Example 8.1: Let us use this methodology to show that the question “does 
program Q, given input y, ever call function foo” is undecidable. Note that Q 
may not have a function foo, in which case the problem is easy, but the hard 
cases are when Q has a function foo but may or may not reach a call to foo with 
input y. Since we only know one undecidable problem, the role of P, in Fig. 8.7 
will be played by the hello-world problem. P> will be the calls-foo problem just 
mentioned. We suppose there is a program that solves the calls-foo problem. 
Our job is to design an algorithm that converts the hello-world problem into 
the calls-foo problem. 

That is, given program Q and its input y, we must construct a program R 
and an input z such that R, with input z, calls foo if and only if Q with input 
y prints hello, world. The construction is not hard: 


1. If Q has a function called foo, rename it and all calls to that function. 
Clearly the new program Q: does exactly what Q does. 


2. Add to Qı a function foo. This function does nothing, and is not called. 
The resulting program is Qa. 


3. Modify Qə to remember the first 12 characters that it prints, storing them 
in a global array A. Let the resulting program be Q3. 


4. Modify Q3 so that whenever it executes any output statement, it then 
checks in the array A to see if it has written 12 characters or more, and 
if so, whether hello, world are the first 12 characters. In that case, call 
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the new function foo that was added in item (2). The resulting program 
is R, and input z is the same as y. 


Suppose that Q with input y prints hello, world as its first output. Then 
R as constructed will call foo. However, if Q with input y does not print 
hello, world as its first output, then R will never call foo. If we can decide 
whether R with input z calls foo, then we also know whether Q with input y 
(remember y = z) prints hello, world. Since we know that no algorithm to 
decide the hello-world problem exists, and all four steps of the construction of 
R from Q could be carried out by a program that edited the code of programs, 
our assumption that there was a calls-foo tester is wrong. No such program 
exists, and the calls-foo problem is undecidable. 


8.1.4 Exercises for Section 8.1 


Exercise 8.1.1: Give reductions from the hello-world problem to each of the 
problems below. Use the informal style of this section for describing plausi- 
ble program transformations, and do not worry about the real limits such as 
maximum file size or memory size that real computers impose. 


*! a) Given a program and an input, does the program eventually halt; i.e., 
does the program not loop forever on the input? 


b) Given a program and an input, does the program ever produce any out- 
put? 


! c) Given two programs and an input, do the programs produce the same 
output for the given input? 


8.2 The Turing Machine 


The purpose of the theory of undecidable problems is not only to establish the 
existence of such problems — an intellectually exciting idea in its own right — 
but to provide guidance to programmers about what they might or might not be 
able to accomplish through programming. The theory also has great pragmatic 
impact when we discuss, as we shall in Chapter 10, problems that although 
decidable, require large amounts of time to solve them. These problems, called 
“intractable problems,” tend to present greater difficulty to the programmer 
and system designer than do the undecidable problems. The reason is that, 
while undecidable problems are usually quite obviously so, and their solutions 
are rarely attempted in practice, the intractable problems are faced every day. 
Moreover, they often yield to small modifications in the requirements or to 
heuristic solutions. Thus, the designer is faced quite frequently with having to 
decide whether or not a problem is in the intractable class, and what to do 
about it, if so. 
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We need tools that will allow us to prove everyday questions undecidable or 
intractable. The technology introduced in Section 8.1 is useful for questions that 
deal with programs, but it does not translate easily to problems in unrelated 
domains. For example, we would have great difficulty reducing the hello-world 
problem to the question of whether a grammar is ambiguous. 

As a result, we need to rebuild our theory of undecidability, based not on 
programs in C or another language, but based on a very simple model of a com- 
puter, called the Turing machine. This device is essentially a finite automaton 
that has a single tape of infinite length on which it may read and write data. 
One advantage of the Turing machine over programs as representation of what 
can be computed is that the Turing machine is sufficiently simple that we can 
represent its configuration precisely, using a simple notation much like the ID’s 
of a PDA. In comparison, while C programs have a state, involving all the vari- 
ables in whatever sequence of function calls have been made, the notation for 
describing these states is far too complex to allow us to make understandable, 
formal proofs. 

Using the Turing machine notation, we shall prove undecidable certain prob- 
lems that appear unrelated to programming. For instance, we shall show in 
Section 9.4 that “Post’s Correspondence Problem,” a simple question involving 
two lists of strings, is undecidable, and this problem makes it easy to show 
questions about grammars, such as ambiguity, to be undecidable. Likewise, 
when we introduce intractable problems we shall find that certain questions, 
seemingly having little to do with computation (e.g., satisfiability of boolean 
formulas), are intractable. 


8.2.1 The Quest to Decide All Mathematical Questions 


At the turn of the 20th century, the mathematician D. Hilbert asked whether 
it was possible to find an algorithm for determining the truth or falsehood of 
any mathematical proposition. In particular, he asked if there was a way to 
determine whether any formula in the first-order predicate calculus, applied 
to integers, was true. Since the first-order predicate calculus of integers is 
sufficiently powerful to express statements like “this grammar is ambiguous,” 
or “this program prints hello, world,” had Hilbert been successful, these 
problems would have algorithms that we now know do not exist. 

However, in 1931, K. Gödel published his famous incompleteness theorem. 
He constructed a formula in the predicate calculus applied to integers, which 
asserted that the formula itself could be neither proved nor disproved within 
the predicate calculus. Gödels technique resembles the construction of the 
self-contradictory program Hə in Section 8.1.2, but deals with functions on the 
integers, rather than with C programs. 

The predicate calculus was not the only notion that mathematicians had for 
“any possible computation.” In fact predicate calculus, being declarative rather 
than computational, had to compete with a variety of notations, including the 
“partial-recursive functions,” a rather programming-language-like notation, and 
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other similar notations. In 1936, A. M. Turing proposed the Turing machine 
as a model of “any possible computation.” This model is computer-like, rather 
than program-like, even though true electronic, or even electromechanical com- 
puters were several years in the future (and Turing himself was involved in the 
construction of such a machine during World War IT). 

Interestingly, all the serious proposals for a model of computation have the 
same power; that is, they compute the same functions or recognize the same 
languages. The unprovable assumption that any general way to compute will 
allow us to compute only the partial-recursive functions (or equivalently, what 
Turing machines or modern-day computers can compute) is known as Church’s 
hypothesis (after the logician A. Church) or the Church-Turing thesis. 


8.2.2 Notation for the Turing Machine 


We may visualize a Turing machine as in Fig. 8.8. The machine consists of 
a finite control, which can be in any of a finite set of states. There is a tape 
divided into squares or cells; each cell can hold any one of a finite number of 
symbols. 


Finite 
control 


/ 


J Xi X|BIB 


X 


Figure 8.8: A Turing machine 


Initially, the input, which is a finite-length string of symbols chosen from the 
input alphabet, is placed on the tape. All other tape cells, extending infinitely 
to the left and right, initially hold a special symbol called the blank. The blank 
is a tape symbol, but not an input symbol, and there may be other tape symbols 
besides the input symbols and the blank, as well. 

There is a tape head that is always positioned at one of the tape cells. The 
Turing machine is said to be scanning that cell. Initially, the tape head is at 
the leftmost cell that holds the input. 

A move of the Turing machine is a function of the state of the finite control 
and the tape symbol scanned. In one move, the Turing machine will: 


1. Change state. The next state optionally may be the same as the current 
state. 


2. Write a tape symbol in the cell scanned. This tape symbol replaces what- 
ever symbol was in that cell. Optionally, the symbol written may be the 
same as the symbol currently there. 
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Move the tape head left or right. In our formalism we require a move, 
and do not allow the head to remain stationary. This restriction does 
not constrain what a Turing machine can compute, since any sequence of 
moves with a stationary head could be condensed, along with the next 
tape-head move, into a single state change, a new tape symbol, and a 
move left or right. 


The formal notation we shall use for a Turing machine (TM) is similar to 
that used for finite automata or PDA’s. We describe a TM by the 7-tuple 


M= (Q,5,T, 6,40, B, F) 


whose components have the following meanings: 


Q: 
X: The finite set of input symbols. 
F: 
ô 


qo: 


F: 


The finite set of states of the finite control. 


The complete set of tape symbols; X is always a subset of T. 


: The transition function. The arguments of ô(q, X) are a state q and a 


tape symbol X. The value of ô(q, X), if it is defined, is a triple (p, Y, D), 
where: 
1. pis the next state, in Q. 


2. Y is the symbol, in T, written in the cell being scanned, replacing 
whatever symbol was there. 


3. Disa direction, either L or R, standing for “left” or “right,” respec- 
tively, and telling us the direction in which the head moves. 


The start state, a member of Q, in which the finite control is found initially. 


The blank symbol. This symbol is in T but not in X; i.e., it is not an input 
symbol. The blank appears initially in all but the finite number of initial 
cells that hold input symbols. 


The set of final or accepting states, a subset of Q. 


8.2.3 Instantaneous Descriptions for Turing Machines 


In order to describe formally what a Turing machine does, we need to develop 
a notation for configurations or instantaneous descriptions (ID’s), like the no- 
tation we developed for PDA’s. Since a TM, in principle, has an infinitely long 
tape, we might imagine that it is impossible to describe the configurations of a 
TM succinctly. However, after any finite number of moves, the TM can have 
visited only a finite number of cells, even though the number of cells visited can 
eventually grow beyond any finite limit. Thus, in every ID, there is an infinite 
prefix and an infinite suffix of cells that have never been visited. These cells 
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must all hold either blanks or one of the finite number of input symbols. We 
thus show in an ID only the cells between the leftmost and the rightmost non- 
blanks. Under special conditions, when the head is scanning one of the leading 
or trailing blanks, a finite number of blanks to the left or right of the nonblank 
portion of the tape must also be included in the ID. 

In addition to representing the tape, we must represent the finite control and 
the tape-head position. To do so, we embed the state in the tape, and place it 
immediately to the left of the cell scanned. To disambiguate the tape-plus-state 
string, we have to make sure that we do not use as a state any symbol that 
is also a tape symbol. However, it is easy to change the names of the states 
so they have nothing in common with the tape symbols, since the operation of 
the TM does not depend on what the states are called. Thus, we shall use the 
string Xı Xə +: Xi—14XiXi+1 ++: Xp to represent an ID in which 


1. q is the state of the Turing machine. 
2. The tape head is scanning the ith symbol from the left. 


3. X1 Xə- +- Xn is the portion of the tape between the leftmost and the right- 
most nonblank. As an exception, if the head is to the left of the leftmost 
nonblank or to the right of the rightmost nonblank, then some prefix or 
suffix of X1 Xə- -Xn will be blank, and i will be 1 or n, respectively. 


We describe moves of a Turing machine M = (Q,»,T,6, qo, B, F) by the S 


notation that was used for PDA’s. When the TM M is understood, we shall 
use just H to reflect moves. As usual, 3 or just F , will be used to indicate 


zero, one, or more moves of the TM M. 
Suppose 6(q, X;) = (p, Y, L); i.e., the next move is leftward. Then 


X1XQ+++ Xi—14XiXi+1 + Xn F AIAI Kj ap aY Aa Xn 


Notice how this move reflects the change to state p and the fact that the tape 
head is now positioned at cell i — 1. There are two important exceptions: 


1. Ifi =1, then M moves to the blank to the left of X,. In that case, 
gX1X2°+:Xn m pBY X2: Xn 
2. Ifi =n and Y = B, then the symbol B written over X, joins the infinite 
sequence of trailing blanks and does not appear in the next ID. Thus, 


X1 XQ +++ Anii An E Ai Xa Kaap 
Now, suppose ô(q, Xi) = (p, Y, R); i.e., the next move is rightward. Then 
Xi XQ te Xi—14XiXiş1 Xn a XX++ Xj-1Y pXi41 +++ Xn 


Here, the move reflects the fact that the head has moved to cell i+ 1. Again 
there are two important exceptions: 
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1. Ifi =n, then the i + 1st cell holds a blank, and that cell was not part of 
the previous ID. Thus, we instead have 


KIA Xn-19Xn E X1 X2 +t: Xn- Y pB 


2. Ifi =1 and Y = B, then the symbol B written over X, joins the infinite 
sequence of leading blanks and does not appear in the next ID. Thus, 


4X1 X2: Pe en 


Example 8.2: Let us design a Turing machine and see how it behaves on a 
typical input. The TM we construct will accept the language {0"1” | n > 1}. 
Initially, it is given a finite sequence of 0’s and 1’s on its tape, preceded and 
followed by an infinity of blanks. Alternately, the TM will change a 0 to an X 
and then a 1 to a Y, until all 0’s and 1’s have been matched. 

In more detail, starting at the left end of the input, it enters a loop in which 
it changes a 0 to an X and moves to the right over whatever 0’s and Y’s it sees, 
until it comes to a 1. It changes the 1 to a Y, and moves left, over Y’s and 0’s, 
until it finds an X. At that point, it looks for a 0 immediately to the right, and 
if it finds one, changes it to X and repeats the process, changing a matching 1 
toaY. 

If the nonblank input is not in 0*1*, then the TM will eventually fail to have 
a next move and will die without accepting. However, if it finishes changing all 
the 0’s to X’s on the same round it changes the last 1 to a Y, then it has found 
its input to be of the form 0”1” and accepts. The formal specification of the 
TM M is 


M= ({q0;q1,02, q3, 44}, {0, 1}, {0, 1, X, Y, B}, ô, go, B, {qa }) 


where ô is given by the table in Fig. 8.9. 


Symbol 
State 0 1 X Y B 
qo (q , X, R) = = (q3, Y, R) Ix 
qı (q, 0, R) (q2,Y, L) a (aY, R) = 
q2 (q2,0, L) E (qo, X, R) (q2,Y, L) ~ 
q3 m T = (q3, Y, R) (q4, B, R) 
qa = - 5 - - 


Figure 8.9: A Turing machine to accept {0"1” | n > 1} 


As M performs its computation, the portion of the tape, where M’s tape 
head has visited, will always be a sequence of symbols described by the regular 
expression X*0*Y*1*. That is, there will be some 0’s that have been changed 
to X’s, followed by some 0’s that have not yet been changed to X’s. Then there 
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are some 1’s that were changed to Y’s, and 1’s that have not yet been changed 
to Y’s. There may or may not be some 0’s and 1’s following. 

State qo is the initial state, and M also enters state qo every time it returns 
to the leftmost remaining 0. If M is in state gg and scanning a 0, the rule in 
the upper-left corner of Fig. 8.9 tells it to go to state qı, change the 0 to an X, 
and move right. Once in state qı, M keeps moving right over all 0’s and Y’s 
that it finds on the tape, remaining in state qı. If M sees an X or a B, it dies. 
However, if M sees a 1 when in state q1, it changes that 1 to a Y, enters state 
q2, and starts moving left. 

In state q2, M moves left over 0’s and Y’s, remaining in state q2. When 
M reaches the rightmost X, which marks the right end of the block of 0’s that 
have already been changed to X, M returns to state gg and moves right. There 
are two cases: 


1. If M now sees a 0, then it repeats the matching cycle we have just de- 
scribed. 


2. If M sees a Y, then it has changed all the 0’s to X’s. If all the 1’s have 
been changed to Y’s, then the input was of the form 0"1", and M should 
accept. Thus, M enters state q3, and starts moving right, over Y’s. If 
the first symbol other than a Y that M sees is a blank, then indeed there 
were an equal number of 0’s and 1’s, so M enters state q4 and accepts. 
On the other hand, if M encounters another 1, then there are too many 
1’s, so M dies without accepting. If it encounters a 0, then the input was 
of the wrong form, and M also dies. 


Here is an example of an accepting computation by M. Its input is 0011. 
Initially, M is in state go, scanning the first 0, i.e., M’s initial ID is qo0011. 
The entire sequence of moves of M is: 


qo0011 F Xq1011 + X0q 114 XqOV1 q2 X0Y1 H 
Xo0Y1 F XXqY1IE XXYqu1 F XXpYY F XXYY H 
XXqoYY EXXYqY H XXYY@BEXXYYBuB 


For another example, consider what M does on the input 0010, which is not 
in the language accepted. 


qo0010 F Xq1010 + X0q110 F XqOYOE q2 X0Y0 H 
Xqo0YOK XXQY0F XXYqn0 F XXY0qB 


The behavior of M on 0010 resembles the behavior on 0011, until in ID XXY q,0 
M scans the final 0 for the first time. M must move right, staying in state qı, 
which takes it to the ID X XY0q,B. However, in state q) M has no move on 
tape symbol B; thus M dies and does not accept its input. 
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8.2.4 Transition Diagrams for Turing Machines 


We can represent the transitions of a Turing machine pictorially, much as we 
did for the PDA. A transition diagram consists of a set of nodes corresponding 
to the states of the TM. An arc from state q to state p is labeled by one or 
more items of the form X/YD, where X and Y are tape symbols, and D is 
a direction, either L or R. That is, whenever ô(q, X) = (p, Y,D), we find the 
label X/Y D on the arc from q to p. However, in our diagrams, the direction D 
is represented pictorially by + for “left” and > for “right.” 

As for other kinds of transition diagrams, we represent the start state by the 
word “Start” and an arrow entering that state. Accepting states are indicated 
by double circles. Thus, the only information about the TM one cannot read 
directly from the diagram is the symbol used for the blank. We shall assume 
that symbol is B unless we state otherwise. 


Example 8.3: Figure 8.10 shows the transition diagram for the Turing ma- 
chine of Example 8.2, whose transition function was given in Fig. 8.9. 


Figure 8.10: Transition diagram for a TM that accepts strings of the form 0”1” 


Example 8.4: While today we find it most convenient to think of Turing ma- 
chines as recognizers of languages, or equivalently, solvers of problems, Turing’s 
original view of his machine was as a computer of integer-valued functions. In 
his scheme, integers were represented in unary, as blocks of a single character, 
and the machine computed by changing the lengths of the blocks or by con- 
structing new blocks elsewhere on the tape. In this simple example, we shall 
show how a Turing machine might compute the function +, which is called 
monus or proper subtraction and is defined by m + n = max(m — n,0). That 
is, m+nism—nifm>nand 0ifm<n. 


332 CHAPTER 8. INTRODUCTION TO TURING MACHINES 


A TM that performs this operation is specified by 


M= ({q0, q1, tae „qe }, {0, 1}, {0, 1, B}, 6, qo, B) 


Note that, since this TM is not used to accept inputs, we have omitted the 
seventh component, which is the set of accepting states. M will start with a 
tape consisting of 010” surrounded by blanks. M halts with 0™*” on its tape, 
surrounded by blanks. 

M repeatedly finds its leftmost remaining 0 and replaces it by a blank. It 
then searches right, looking for a 1. After finding a 1, it continues right, until it 
comes to a 0, which it replaces by al. M then returns left, seeking the leftmost 
0, which it identifies when it first meets a blank and then moves one cell to the 
right. The repetition ends if either: 


1. Searching right for a 0, M encounters a blank. Then the n 0’s in 010” 
have all been changed to 1’s, and n + 1 of the m 0’s have been changed 
to B. M replaces the n+ 1 1’s by one 0 and n B’s, leaving m — n 0’s on 
the tape. Since m > n in this case, m-n =m-~=+n. 


2. Beginning the cycle, M cannot find a 0 to change to a blank, because the 
first m 0’s already have been changed to B. Then n > m, som >n =Q. 
M replaces all remaining 1’s and 0’s by B and ends with a completely 
blank tape. 


Figure 8.11 gives the rules of the transition function 6, and we have also 
represented 6 as a transition diagram in Fig. 8.12. The following is a summary 
of the role played by each of the seven states: 


qo: This state begins the cycle, and also breaks the cycle when appropriate. 
If M is scanning a 0, the cycle must repeat. The 0 is replaced by B, the 
head moves right, and state qı is entered. On the other hand, if M is 
scanning 1, then all possible matches between the two groups of 0’s on 
the tape have been made, and M goes to state qs to make the tape blank. 


qi: In this state, M searches right, through the initial block of 0’s, looking 
for the leftmost 1. When found, M goes to state qo. 


q2: M moves right, skipping over 1’s, until it finds a 0. It changes that 0 to 
a 1, turns leftward, and enters state q3. However, it is also possible that 
there are no more 0’s left after the block of 1’s. In that case, M in state 
q2 encounters a blank. We have case (1) described above, where n 0’s in 
the second block of 0’s have been used to cancel n of the m 0’s in the first 
block, and the subtraction is complete. M enters state q4, whose purpose 
is to convert the 1’s on the tape to blanks. 


q3: M moves left, skipping over 0’s and 1’s, until it finds a blank. When it 
finds B, it moves right and returns to state qo, beginning the cycle again. 
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0/ BS 0/0=— 
1/Be 1/ B= 


Figure 8.12: Transition diagram for the TM of Example 8.4 
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qa: Here, the subtraction is complete, but one unmatched 0 in the first block 
was incorrectly changed to a B. M therefore moves left, changing 1’s to 
B’s, until it encounters a B on the tape. It changes that B back to 0, and 
enters state gg, wherein M halts. 


qs: State qs is entered from go when it is found that all 0’s in the first block 
have been changed to B. In this case, described in (2) above, the result 
of the proper subtraction is 0. M changes all remaining 0’s and 1’s to B 
and enters state qe. 


qe: The sole purpose of this state is to allow M to halt when it has finished 
its task. If the subtraction had been a subroutine of some more complex 
function, then gg would initiate the next step of that larger computation. 


8.2.5 The Language of a Turing Machine 


We have intuitively suggested the way that a Turing machine accepts a lan- 
guage. The input string is placed on the tape, and the tape head begins at the 
leftmost input symbol. If the TM eventually enters an accepting state, then 
the input is accepted, and otherwise not. 

More formally, let M = (Q,»,T,6,¢0,B,F) be a Turing machine. Then 
L(M) is the set of strings w in ©* such that qow É ap@ for some state p in F 
and any tape strings a and (3. This definition was assumed when we discussed 
the Turing machine of Example 8.2, which accepts strings of the form 0”1”. 

The set of languages we can accept using a Turing machine is often called 
the recursively enumerable languages or RE languages. The term “recursively 
enumerable” comes from computational formalisms that predate the Turing 
machine but that define the same class of languages or arithmetic functions. 
We discuss the origins of the term as an aside (box) in Section 9.2.1. 


8.2.6 Turing Machines and Halting 


There is another notion of “acceptance” that is commonly used for Turing 
machines: acceptance by halting. We say a TM halts if it enters a state q, 
scanning a tape symbol X, and there is no move in this situation; i.e., 6(q, X) 
is undefined. 


Example 8.5: The Turing machine M of Example 8.4 was not designed to 
accept a language; rather we viewed it as computing an arithmetic function. 
Note, however, that M halts on all strings of 0’s and 1’s, since no matter what 
string M finds on its tape, it will eventually cancel its second group of 0’s, if it 
can find such a group, against its first group of 0’s, and thus must reach state 
ge and halt. 
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Notational Conventions for Turing Machines 


The symbols we normally use for Turing machines resemble those for the 
other kinds of automata we have seen. 


1. Lower-case letters at the beginning of the alphabet stand for input 
symbols. 


. Capital letters, typically near the end of the alphabet, are used for 
tape symbols that may or may not be input symbols. However, B is 
generally used for the blank symbol. 


. Lower-case letters near the end of the alphabet are strings of input 
symbols. 


. Greek letters are strings of tape symbols. 


. Letters such as q, p, and nearby letters are states. 


We can always assume that a TM halts if it accepts. That is, without 
changing the language accepted, we can make (q, X) undefined whenever q is 
an accepting state. In general, without otherwise stating so: 


e We assume that a TM always halts when it is in an accepting state. 


Unfortunately, it is not always possible to require that a TM halts even 
if it does not accept. Those languages with Turing machines that do halt 
eventually, regardless of whether or not they accept, are called recursive, and 
we shall consider their important properties starting in Section 9.2.1. Turing 
machines that always halt, regardless of whether or not they accept, are a good 
model of an “algorithm.” If an algorithm to solve a given problem exists, then 
we say the problem is “decidable,” so TM’s that always halt figure importantly 
into decidability theory in Chapter 9. 


8.2.7 Exercises for Section 8.2 


Exercise 8.2.1: Show the ID’s of the Turing machine of Fig. 8.9 if the input 
tape contains: 


* a) 00. 
b) 000111. 
c) 00111. 


! Exercise 8.2.2: Design Turing machines for the following languages: 


x 
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* a) The set of strings with an equal number of 0’s and 1’s. 
b) {abe |n > 1}. 
c) {wwf | w is any string of 0’s and 1’s}. 


Exercise 8.2.3: Design a Turing machine that takes as input a number N and 
adds 1 to it in binary. To be precise, the tape initially contains a $ followed by 
N in binary. The tape head is initially scanning the $ in state qo. Your TM 
should halt with N+1, in binary, on its tape, scanning the leftmost symbol of 
N +1, in state qr. You may destroy the $ in creating N + 1, if necessary. For 
instance, qo$10011 Ë $qp10100, and qo$11111 Ë qs 100000. 


a) Give the transitions of your Turing machine, and explain the purpose of 
each state. 


b) Show the sequence of ID’s of your TM when given input $111. 


Exercise 8.2.4: In this exercise we explore the equivalence between function 
computation and language recognition for Turing machines. For simplicity, we 
shall consider only functions from nonnegative integers to nonnegative integers, 
but the ideas of this problem apply to any computable functions. Here are the 
two central definitions: 


e Define the graph of a function f to be the set of all strings of the form 
[x, f(x)], where x is a nonnegative integer in binary, and f(x) is the value 
of function f with argument x, also written in binary. 


e A Turing machine is said to compute function f if, started with any non- 
negative integer x on its tape, in binary, it halts (in any state) with f(x), 
in binary, on its tape. 


Answer the following, with informal, but clear constructions. 


a) Show how, given a TM that computes f, you can construct a TM that 
accepts the graph of f as a language. 


b) Show how, given a TM that accepts the graph of f, you can construct a 
TM that computes f. 


c) A function is said to be partial if it may be undefined for some arguments. 
If we extend the ideas of this exercise to partial functions, then we do not 
require that the TM computing f halts if its input x is one of the integers 
for which f(x) is not defined. Do your constructions for parts (a) and (b) 
work if the function f is partial? If not, explain how you could modify 
the construction to make it work. 


8.3. PROGRAMMING TECHNIQUES FOR TURING MACHINES 337 


Exercise 8.2.5: Consider the Turing machine 


M = ({qo, 1, 92,495}, {0, 1}, {0, 1, B}, 6, go, B, {a7 }) 


Informally but clearly describe the language L(M) if 6 consists of the following 
sets of rules: 


* a) ô(qo,0) = (m,1, R); 6(q, 1) = (G0, 0, R); 6(q1, B) = (ay, B, R). 


b) 6(qo,9) = (go, B, R); 6(qo, 1) = (m, B, R); 0(m, 1) = (um, B, R); 0(m, B) = 
(qr, B, R). 


! c) 6(q0,0) = (m,1, R); 6(m,1) = (42,0, L); &lq2,1) = (q0, 1, R); 6(m, B) = 
(qr, B, R). 


8.3 Programming Techniques for Turing 
Machines 


Our goal is to give you a sense of how a Turing machine can be used to compute 
in a manner not unlike that of a conventional computer. Eventually, we want 
to convince you that a TM is exactly as powerful as a conventional computer. 
In particular, we shall learn that the Turing machine can perform the sort of 
calculations on other Turing machines that we saw performed in Section 8.1.2 by 
a program that examined other programs. This “introspective” ability of both 
Turing machines and computer programs is what enables us to prove problems 
undecidable. 

To make the ability of a TM clearer, we shall present a number of examples 
of how we might think of the tape and finite control of the Turing machine. 
None of these tricks extend the basic model of the TM; they are only notational 
conveniences. Later, we shall use them to simulate extended Turing-machine 
models that have additional features — for instance, more than one tape — by 
the basic TM model. 


8.3.1 Storage in the State 


We can use the finite control not only to represent a position in the “program” 
of the Turing machine, but to hold a finite amount of data. Figure 8.13 suggests 
this technique (as well as another idea: multiple tracks). There, we see the finite 
control consisting of not only a “control” state q, but three data elements A, 
B, and C. The technique requires no extension to the TM model; we merely 
think of the state as a tuple. In the case of Fig. 8.13, we should think of the 
state as |q, A, B,C]. Regarding states this way allows us to describe transitions 
in a more systematic way, often making the strategy behind the TM program 
more transparent. 
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State 


Storage |A 


Track 1 
Track 2 
Track 3 


aalan Gl 
(S) 


Figure 8.13: A Turing machine viewed as having finite-control storage and 
multiple tracks 


Example 8.6: We shall design a TM 


M = (Q, {0,1}, {0, 1, B}, ô, [qo, B], B, {[q1, B]}) 


that remembers in its finite control the first symbol (0 or 1) that it sees, and 
checks that it does not appear elsewhere on its input. Thus, M accepts the 
language 01*+10*. Accepting regular languages such as this one does not stress 
the ability of Turing machines, but it will serve as a simple demonstration. 

The set of states Q is {q0, qi} x {0, 1, B}. That is, the states may be thought 
of as pairs with two components: 


a) 


io” 
ma 


A control portion, go or qı, that remembers what the TM is doing. Con- 
trol state go indicates that M has not yet read its first symbol, while qı 
indicates that it has read the symbol, and is checking that it does not 
appear elsewhere, by moving right and hoping to reach a blank cell. 


A data portion, which remembers the first symbol seen, which must be 0 
or 1. The symbol B in this component means that no symbol has been 
read. 


The transition function 6 of M is as follows: 


1. 


5([go, B],a) = ([q1,a],a, R) for a = 0 or a= 1. Initially, qo is the control 
state, and the data portion of the state is B. The symbol scanned is copied 
into the second component of the state, and M moves right, entering 
control state qı as it does so. 


. ô([q1, a], @) = ([q1, a], a, R) where @ is the “complement” of a, that is, 0 if 


a=1and1 ifa = 0. In state qı, M skips over each symbol 0 or 1 that 
is different from the one it has stored in its state, and continues moving 
right. 


. ô([q1,a], B) = ([m, B], B, R) for a = 0 or a = 1. If M reaches the first 


blank, it enters the accepting state [q1, B]. 


8.3. PROGRAMMING TECHNIQUES FOR TURING MACHINES 339 


Notice that M has no definition for 6([q1,a],a) for a = 0 or a= 1. Thus, if 
M encounters a second occurrence of the symbol it stored initially in its finite 
control, it halts without having entered the accepting state. 


8.3.2 Multiple Tracks 


Another useful “trick” is to think of the tape of a Turing machine as composed 
of several tracks. Each track can hold one symbol, and the tape alphabet of the 
TM consists of tuples, with one component for each “track.” Thus, for instance, 
the cell scanned by the tape head in Fig. 8.13 contains the symbol [X, Y, Z]. 
Like the technique of storage in the finite control, using multiple tracks does 
not extend what the Turing machine can do. It is simply a way to view tape 
symbols and to imagine that they have a useful structure. 


Example 8.7: A common use of multiple tracks is to treat one track as holding 
the data and a second track as holding a mark. We can check off each symbol 
as we “use” it, or we can keep track of a small number of positions within the 
data by marking only those positions. Examples 8.2 and 8.4 were two instances 
of this technique, but in neither example did we think explicitly of the tape as 
if it were composed of tracks. In the present example, we shall use a second 
track explicitly to recognize the non-context-free language 


Lwew = {wcw | w is in (0+1)T} 
The Turing machine we shall design is: 
M = (Q, £, T, ô, [q, B], [B, B], {[q9, B]}) 
where: 


Q: The set of states is {q1,q2,...,q9} x {0,1, B}, that is, pairs consisting 
of a control state q; and a data component: 0, 1, or blank. We again 
use the technique of storage in the finite control, as we allow the state to 
remember an input symbol 0 or 1. 


T: The set of tape symbols is {B,*} x {0,1,c, B}. The first component, or 
track, can be either blank or “checked,” represented by the symbols B 
and x, respectively. We use the * to check off symbols of the first and 
second groups of 0’s and 1’s, eventually confirming that the string to the 
left of the center marker c is the same as the string to its right. The 
second component of the tape symbol is what we think of as the tape 
symbol itself. That is, we may think of the symbol [B, X] as if it were 
the tape symbol X, for X = 0,1,c, B. 


£: The input symbols are [B, 0], [B, 1], and [B,c], which, as just mentioned, 
we identify with 0, 1, and c, respectively. 
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ô: The transition function 6 is defined by the following rules, in which a and 
b each may stand for either 0 or 1. 


1 


10. 


11. 


. O([q, B], [B,a]) = ([q2,a], [x,a], R). In the initial state, M picks up 


the symbol a (which can be either 0 or 1), stores it in its finite control, 
goes to control state q2, “checks off” the symbol it just scanned, and 
moves right. Notice that by changing the first component of the tape 
symbol from B to x, it performs the check-off. 


. O([g2, a], [B,6]) = (lq2,a],|B,b], R). M moves right, looking for the 


symbol c. Remember that a and b can each be either 0 or 1, inde- 
pendently, but cannot be c. 


. O([g2, a], [B, c]) = ([¢3, a], [B, c], R). When M finds the c, it continues 


to move right, but changes to control state q3. 


. O([g3, a], [*,b]) = ([g3, a], [*, 6], R). In state q3, M continues past all 


checked symbols. 


. O([g3, a], [B,a]) = ([q4, B], [x,a], L). If the first unchecked symbol 


that M finds is the same as the symbol in its finite control, it checks 
this symbol, because it has matched the corresponding symbol from 
the first block of 0’s and 1’s. M goes to control state q4, dropping 
the symbol from its finite control, and starts moving left. 


. ô([q4, B], [x,a]) = ([q4, B], [x,a], L). M moves left over checked sym- 


bols. 


. ô([q4, B], [B,c]) = ([qs, B], [B,c], L). When M encounters the sym- 


bol c, it switches to state qs and continues left. In state q5, M must 
make a decision, depending on whether or not the symbol immedi- 
ately to the left of the c is checked or unchecked. If checked, then we 
have already considered the entire first block of 0’s and 1’s — those 
to the left of the c. We must make sure that all the 0’s and 1’s to the 
right of the c are also checked, and accept if no unchecked symbols 
remain to the right of the c. If the symbol immediately to the left 
of the c is unchecked, we find the leftmost unchecked symbol, pick it 
up, and start the cycle that began in state qı. 


. O([¢s5, B],[B,a]) = ([qs, B], [B,a], L). This branch covers the case 


where the symbol to the left of c is unchecked. M goes to state qe 
and continues left, looking for a checked symbol. 


. O([¢6, B],[B,a]) = ([qe, B], [B,a], L). As long as symbols are un- 


checked, M remains in state gg and proceeds left. 

ô(lqs, B], [x,a]) = (lqı, B], [x,a], R). When the checked symbol is 
found, M enters state qı and moves right to pick up the first un- 
checked symbol. 

ô(lqs, B], [x,a]) = (¢7, B], [x,a], R). Now, let us pick up the branch 
from state qs where we have just moved left from the c and find a 
checked symbol. We start moving right again, entering state q7. 


8.3. PROGRAMMING TECHNIQUES FOR TURING MACHINES 341 


12. ô([q7, B],[B,c]) = ([qs, B], [B,c], R). In state q7 we shall surely see 
the c. We enter state gg as we do so, and proceed right. 


13. ô([qs, B], [x, a]) = ([lqs, B], [x,a], R). M moves right in state qs, skip- 
ping over any checked 0’s or 1’s that it finds. 

14. ô([qs, B],[B, B]) = ([q9, B], |B, B], R). If M reaches a blank cell in 
state gg without encountering any unchecked 0 or 1, then M accepts. 
If M first finds an unchecked 0 or 1, then the blocks before and after 
the c do not match, and M halts without accepting. 


8.3.3 Subroutines 


As with programs in general, it helps to think of Turing machines as built from 
a collection of interacting components, or “subroutines.” A Turing-machine 
subroutine is a set of states that perform some useful process. This set of states 
includes a start state and another state that temporarily has no moves, and 
that serves as the “return” state to pass control to whatever other set of states 
called the subroutine. The “call” of a subroutine occurs whenever there is a 
transition to its initial state. Since the TM has no mechanism for remembering 
a “return address,” that is, a state to go to after it finishes, should our design 
of a TM call for one subroutine to be called from several states, we can make 
copies of the subroutine, using a new set of states for each copy. The “calls” 
are made to the start states of different copies of the subroutine, and each copy 
“returns” to a different state. 


Example 8.8: We shall design a TM to implement the function “multiplica- 
tion.” That is, our TM will start with 010"1 on its tape, and will end with 
0™” on the tape. An outline of the strategy is: 


1. The tape will, in general, have one nonblank string of the form 0°10"10'" 
for some k. 


2. In one basic step, we change a 0 in the first group to B and add n 0’s to 
the last group, giving us a string of the form 0‘~!10"104+))™, 


3. As a result, we copy the group of n 0’s to the end m times, once each 
time we change a 0 in the first group to B. When the first group of 0’s is 
completely changed to blanks, there will be mn 0’s in the last group. 


4. The final step is to change the leading 10”1 to blanks, and we are done. 


The heart of this algorithm is a subroutine, which we call Copy. This sub- 
routine helps implement step (2) above, copying the block of n 0’s to the end. 
More precisely, Copy converts an ID of the form 0™—*1q,0"104-))" to ID 
0™—k1q50”10%”. Figure 8.14 shows the transitions of subroutine Copy. This 


342 CHAPTER 8. INTRODUCTION TO TURING MACHINES 


1/1 > 1/1 = 
0/0 > 0/0 = 


pat ox (_) B10 C) 
> F < 


X10 = 


Figure 8.14: The subroutine Copy 


subroutine marks the first 0 with an X, moves right in state q2 until it finds a 
blank, copies the 0 there, and moves left in state q3 to find the marker X. It 
repeats this cycle until in state qı it finds a 1 instead of a 0. At that point, it 
uses state q4 to change the X’s back to 0’s, and ends in state qs. 

The complete multiplication Turing machine starts in state qo. The first 
thing it does is go, in several steps, from ID qo0™10" to ID 0™—'1q,0". The 
transitions needed are shown in the portion of Fig. 8.15 to the left of the sub- 
routine call; these transitions involve states go and qe only. 


0/0 
B/ B= 
Ig 
ee 0/0= 
| 
| 0/0 <= 1/1 
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| 
B/B > 
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Figure 8.15: The complete multiplication program uses the subroutine Copy 


Then, to the right of the subroutine call in Fig. 8.15 we see states q7 through 
qı2. The purpose of states q7, gg, and qg is to take control after Copy has just 
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copied a block of n 0’s, and is in ID 0™-*1q50"10'". Eventually, these states 
bring us to state qo0’—*10"10'". At that point, the cycle starts again, and 
Copy is called to copy the block of n 0’s again. 

As an exception, in state gg the TM may find that all m 0’s have been 
changed to blanks (i.e., k = m). In that case, a transition to state gio occurs. 
This state, with the help of state q11, changes the leading 10”1 to blanks and 
enters the halting state qi2. At this point, the TM is in ID qy20™”, and its job 
is done. 


8.3.4 Exercises for Section 8.3 


Exercise 8.3.1: Redesign your Turing machines from Exercise 8.2.2 to take 
advantage of the programming techniques discussed in Section 8.3. 


Exercise 8.3.2: A common operation in Turing-machine programs involves 
“shifting over.” Ideally, we would like to create an extra cell at the current 
head position, in which we could store some character. However, we cannot 
edit the tape in this way. Rather, we need to move the contents of each of the 
cells to the right of the current head position one cell right, and then find our 
way back to the current head position. Show how to perform this operation. 
Hint: Leave a special symbol to mark the position to which the head must 
return. 


Exercise 8.3.3: Design a subroutine to move a TM head from its current 
position to the right, skipping over all 0’s, until reaching a 1 or a blank. If the 
current position does not hold 0, then the TM should halt. You may assume 
that there are no tape symbols other than 0, 1, and B (blank). Then, use this 
subroutine to design a TM that accepts all strings of 0’s and 1’s that do not 
have two 1’s in a row. 


8.4 Extensions to the Basic Turing Machine 


In this section we shall see certain computer models that are related to Turing 
machines and have the same language-recognizing power as the basic model of 
a TM with which we have been working. One of these, the multitape Turing 
machine, is important because it is much easier to see how a multitape TM can 
simulate real computers (or other kinds of Turing machines), compared with 
the single-tape model we have been studying. Yet the extra tapes add no power 
to the model, as far as the ability to accept languages is concerned. 

We then consider the nondeterministic Turing machine, an extension of the 
basic model that is allowed to make any of a finite set of choices of move in 
a given situation. This extension also makes “programming” Turing machines 
easier in some circumstances, but adds no language-defining power to the basic 
model. 
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8.4.1 Multitape Turing Machines 


A multitape TM is as suggested by Fig. 8.16. The device has a finite control 
(state), and some finite number of tapes. Each tape is divided into cells, and 
each cell can hold any symbol of the finite tape alphabet. As in the single-tape 
TM, the set of tape symbols includes a blank, and has a subset called the input 
symbols, of which the blank is not a member. The set of states includes an 
initial state and some accepting states. Initially: 


1. 


o e w N 


The input, a finite sequence of input symbols, is placed on the first tape. 


. All other cells of all the tapes hold the blank. 


The finite control is in the initial state. 


. The head of the first tape is at the left end of the input. 


. All other tape heads are at some arbitrary cell. Since tapes other than 


the first tape are completely blank, it does not matter where the head is 
placed initially; all cells of these tapes “look” the same. 


mo 


4 7 


Figure 8.16: A multitape Turing machine 


A move of the multitape TM depends on the state and the symbol scanned 
by each of the tape heads. In one move, the multitape TM does the following: 


1. 


The control enters a new state, which could be the same as the previous 
state. 


. On each tape, a new tape symbol is written on the cell scanned. Any of 


these symbols may be the same as the symbol previously there. 


. Each of the tape heads makes a move, which can be either left, right, or 


stationary. The heads move independently, so different heads may move 
in different directions, and some may not move at all. 
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We shall not give the formal notation of transition rules, whose form is 
a straightforward generalization of the notation for the one-tape TM, except 
that directions are now indicated by a choice of L, R, or S. For the one- 
tape machine, we did not allow the head to remain stationary, so the S option 
was not present. You should be able to imagine an appropriate notation for 
instantaneous descriptions of the configuration of a multitape TM; we shall not 
give this notation formally. Multitape Turing machines, like one-tape TM’s, 
accept by entering an accepting state. 


8.4.2 Equivalence of One-Tape and Multitape TM’s 


Recall that the recursively enumerable languages are defined to be those ac- 
cepted by a one-tape TM. Surely, multitape TM’s accept all the recursively 
enumerable languages, since a one-tape TM is a multitape TM. However, are 
there languages that are not recursively enumerable, yet are accepted by mul- 
titape TM’s? The answer is “no,” and we prove this fact by showing how to 
simulate a multitape TM by a one-tape TM. 


Theorem 8.9: Every language accepted by a multitape TM is recursively 
enumerable. 


PROOF: The proof is suggested by Fig. 8.17. Suppose language L is accepted 
by a k-tape TM M. We simulate M with a one-tape TM N whose tape we 
think of as having 2k tracks. Half these tracks hold the tapes of M, and the 
other half of the tracks each hold only a single marker that indicates where the 
head for the corresponding tape of M is currently located. Figure 8.17 assumes 
k = 2. The second and fourth tracks hold the contents of the first and second 
tapes of M, track 1 holds the position of the head of tape 1, and track 3 holds 
the position of the second tape head. 


Figure 8.17: Simulation of a two-tape Turing machine by a one-tape Turing 
machine 
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A Reminder About Finiteness 


A common fallacy is to confuse a value that is finite at any time with a set 
of values that is finite. The many-tapes-to-one construction may help us 
appreciate the difference. In that construction, we used tracks on the tape 
to record the positions of the tape heads. Why could we not store these 
positions as integers in the finite control? Carelessly, one could argue that 
after n moves, the TM can have tape head positions that must be within 
n positions of original head positions, and so the head only has to store 


integers up to n. 

The problem is that, while the positions are finite at any time, the 
complete set of positions possible at any time is infinite. If the state is 
to represent any head position, then there must be a data component of 
the state that has any integer as value. This component forces the set of 
states to be infinite, even if only a finite number of them can be used at 
any finite time. The definition of a Turing machine requires that the set 
of states be finite. Thus, it is not permissible to store a tape-head position 
in the finite control. 


To simulate a move of M, N’s head must visit the k head markers. So that 
N not get lost, it must remember how many head markers are to its left at all 
times; that count is stored as a component of N’s finite control. After visiting 
each head marker and storing the scanned symbol in a component of its finite 
control, N knows what tape symbols are being scanned by each of M’s heads. 
N also knows the state of M, which it stores in N’s own finite control. Thus, 
N knows what move M will make. 

N now revisits each of the head markers on its tape, changes the symbol 
in the track representing the corresponding tapes of M, and moves the head 
markers left or right, if necessary. Finally, N changes the state of M as recorded 
in its own finite control. At this point, N has simulated one move of M. 

We select as N’s accepting states all those states that record M’s state as 
one of the accepting states of M. Thus, whenever the simulated M accepts, N 
also accepts, and N does not accept otherwise. 


8.4.3 Running Time and the Many-Tapes-to-One 
Construction 


Let us now introduce a concept that will become quite important later: the 
“time complexity” or “running time” of a Turing machine. We say the running 
time of TM M on input w is the number of steps that M makes before halting. 
If M doesn’t halt on w, then the running time of M on w is infinite. The time 
complexity of TM M is the function T(n) that is the maximum, over all inputs 
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w of length n, of the running time of M on w. For Turing machines that do 
not halt on all inputs, T(n) may be infinite for some or even all n. However, we 
shall pay special attention to TM’s that do halt on all inputs, and in particular, 
those that have a polynomial time complexity T(n); Section 10.1 initiates this 
study. 

The construction of Theorem 8.9 seems clumsy. In fact, the constructed one- 
tape TM may take much more running time than the multitape TM. However, 
the amounts of time taken by the two Turing machines are commensurate in 
a weak sense: the one-tape TM takes time that is no more than the square of 
the time taken by the other. While “squaring” is not a very strong guarantee, 
it does preserve polynomial running time. We shall see in Chapter 10 that: 


a) The difference between polynomial time and higher growth rates in run- 
ning time is really the divide between what we can solve by computer and 
what is in practice not solvable. 


b) Despite extensive research, the running time needed to solve many prob- 
lems has not been resolved closer than to within some polynomial. Thus, 
the question of whether we are using a one-tape or multitape TM to solve 
the problem is not crucial when we examine the running time needed to 
solve a particular problem. 


The argument that the running times of the one-tape and multitape TM’s are 
within a square of each other is as follows. 


Theorem 8.10: The time taken by the one-tape TM N of Theorem 8.9 to 
simulate n moves of the k-tape TM M is O(n’). 


PROOF: After n moves of M, the tape head markers cannot have separated by 
more than 2n cells. Thus, if N starts at the leftmost marker, it has to move 
no more than 2n cells right, to find all the head markers. It can then make 
an excursion leftward, changing the contents of the simulated tapes of M, and 
moving head markers left or right as needed. Doing so requires no more than 
2n moves left, plus at most 2k moves to reverse direction and write a marker 
X in the cell to the right (in the case that a tape head of M moves right). 
Thus, the number of moves by N needed to simulate one of the first n moves 
is no more than 4n + 2k. Since k is a constant, independent of the number of 
moves simulated, this number of moves is O(n). To simulate n moves requires 
no more than n times this amount, or O(n”). 


8.4.4 Nondeterministic Turing Machines 


A nondeterministic Turing machine (NTM) differs from the deterministic vari- 
ety we have been studying by having a transition function 6 such that for each 
state q and tape symbol X, 6(q, X) is a set of triples 


{(a1,¥1,D1), (Q2, Y2, D2), -- -, (dk, Yk, De)} 
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where k is any finite integer. The NTM can choose, at each step, any of the 
triples to be the next move. It cannot, however, pick a state from one, a tape 
symbol from another, and the direction from yet another. 

The language accepted by an NTM M is defined in the expected manner, 
in analogy with the other nondeterministic devices, such as NFA’s and PDA’s, 
that we have studied. That is, M accepts an input w if there is any sequence of 
choices of move that leads from the initial ID with w as input, to an ID with an 
accepting state. The existence of other choices that do not lead to an accepting 
state is irrelevant, as it is for the NFA or PDA. 

The NTM’s accept no languages not accepted by a deterministic TM (or 
DTM if we need to emphasize that it is deterministic). The proof involves 
showing that for every NTM My, we can construct a DTM Mp that explores 
the ID’s that My can reach by any sequence of its choices. If Mp finds one 
that has an accepting state, then Mp enters an accepting state of its own. Mp 
must be systematic, putting new ID’s on a queue, rather than a stack, so that 
after some finite time Mp has simulated all sequences of up to k moves of My, 
fork =1,2,.... 


Theorem 8.11: If My is a nondeterministic Turing machine, then there is a 
deterministic Turing machine Mp such that L(Myn) = L(Mp). 


PROOF: Mp will be designed as a multitape TM, sketched in Fig. 8.18. The 
first tape of Mp holds a sequence of ID’s of My, including the state of My. 
One ID of My is marked as the “current” ID, whose successor ID’s are in the 
process of being discovered. In Fig. 8.18, the third ID is marked by an x along 
with the inter-ID separator, which is the x. All ID’s to the left of the current 
one have been explored and can be ignored subsequently. 


Finite 
control 


Queue - x ; @ 
of ID’s IDI ID2 ID3 * ID4 


Scratch 
tape 


Figure 8.18: Simulation of an NTM by a DTM 
To process the current ID, Mp does the following: 


1. Mp examines the state and scanned symbol of the current ID. Built into 
the finite control of Mp is the knowledge of what choices of move My 


8.4. EXTENSIONS TO THE BASIC TURING MACHINE 349 


has for each state and symbol. If the state in the current ID is accepting, 
then Mp accepts and simulates My no further. 


2. However, if the state is not accepting, and the state-symbol combination 
has k moves, then Mp uses its second tape to copy the ID and then make 
k copies of that ID at the end of the sequence of ID’s on tape 1. 


3. Mp modifies each of those k ID’s according to a different one of the k 
choices of move that My has from its current ID. 


4. Mp returns to the marked, current ID, erases the mark, and moves the 
mark to the next ID to the right. The cycle then repeats with step (1). 


It should be clear that the simulation is accurate, in the sense that Mp will 
only accept if it finds that My can enter an accepting ID. However, we need 
to confirm that if My enters an accepting ID after a sequence of n of its own 
moves, then Mp will eventually make that ID the current ID and will accept. 

Suppose that m is the maximum number of choices My has in any configu- 
ration. Then there is one initial ID of My, at most m ID’s that My can reach 
after one move, at most m? ID’s My can reach after two moves, and so on. 
Thus, after n moves, My can reach at most 1+ m+m?+---+m" ID’s. This 
number is at most nm” ID’s. 

The order in which Mp explores ID’s of My is “breadth first”; that is, it 
explores all ID’s reachable by 0 moves (i.e., the initial ID), then all ID’s reach- 
able by one move, then those reachable by two moves, and so on. In particular, 
Mp will make current, and consider the successors of, all ID’s reachable by up 
to n moves before considering any ID’s that are only reachable by more than n 
moves. 

As a consequence, the accepting ID of My will be considered by Mp among 
the first nm” ID’s that it considers. We only care that Mp considers this ID 
in some finite time, and this bound is sufficient to assure us that the accepting 
ID is considered eventually. Thus, if My accepts, then so does Mp. Since we 
already observed that if Mp accepts it does so only because My accepts, we 
conclude that L(My) = L(Mp). 


Notice that the constructed deterministic TM may take exponentially more 
time than the nondeterministic TM. It is unknown whether or not this expo- 
nential slowdown is necessary. In fact, Chapter 10 is devoted to this question 
and the consequences of someone discovering a better way to simulate NTM’s 
deterministically. 


8.4.5 Exercises for Section 8.4 


Exercise 8.4.1: Informally but clearly describe multitape Turing machines 
that accept each of the languages of Exercise 8.2.2. Try to make each of your 
Turing machines run in time proportional to the input length. 
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Exercise 8.4.2: Here is the transition function of a nondeterministic TM M = 
({q0; qi, qo}, {0, 1}, {0, 1, B}, ô, qo, B, {qo}): 


ô| 0 1 B 


(m,0, R)} 


qo {(q0, 1, R)} { 0 
qı oes (qo, 0, L)} te (qo, 1, L)} rs 
q2 


Show the ID’s reachable from the initial ID if the input is: 
* a) OL. 
b) 011. 


Exercise 8.4.3: Informally but clearly describe nondeterministic Turing ma- 
chines — multitape if you like — that accept the following languages. Try to 
take advantage of nondeterminism to avoid iteration and save time in the non- 
deterministic sense. That is, prefer to have your NTM branch a lot, while each 
branch is short. 


* a) The language of all strings of 0’s and 1’s that have some string of length 
100 that repeats, not necessarily consecutively. Formally, this language is 
the set of strings of 0’s and 1’s of the form wayxz, where |x| = 100, and 
w, y, and z are of arbitrary length. 


io” 
Na 


The language of all strings of the form wi; #w2# ---#wnp, for any n, such 
that each w; is a string of 0’s and 1’s, and for some j, w; is the integer j 
in binary. 


O 
wa 


The language of all strings of the same form as (b), but for at least two 
values of j, we have w; equal to j in binary. 


Exercise 8.4.4: Consider the nondeterministic Turing machine 


M= ({qo, 1,492; ar}; {0, 1}, {0, 1, B}, ô, qo, B, {ar}) 


Informally but clearly describe the language L(M) if 6 consists of the following 
sets of rules: ô(qo, 0) = {(qo, 1, R), (qı, 1, R)}; o(q, 1) = {(q@, 0, L)}; ô(q2, 1) = 
{(q0, 1, R)}; o(q, B) = {(q;, B, R)}. 


Exercise 8.4.5: Consider a nondeterministic TM whose tape is infinite in 
both directions. At some time, the tape is completely blank, except for one 
cell, which holds the symbol $. The head is currently at some blank cell, and 
the state is q. 


a) Write transitions that will enable the NTM to enter state p, scanning the 
$. 


! b) Suppose the TM were deterministic instead. How would you enable it to 
find the $ and enter state p? 
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Exercise 8.4.6: Design the following 2-tape TM to accept the language of all 
strings of 0’s and 1’s with an equal number of each. The first tape contains the 
input, and is scanned from left to right. The second tape is used to store the 
excess of 0’s over 1’s, or vice-versa, in the part of the input seen so far. Specify 
the states, transitions, and the intuitive purpose of each state. 


Exercise 8.4.7: In this exercise, we shall implement a stack using a special 
3-tape TM. 


1. The first tape will be used only to hold and read the input. The input 
alphabet consists of the symbol f, which we shall interpret as “pop the 
stack,” and the symbols a and b, which are interpreted as “push an a 
(respectively b) onto the stack.” 


2. The second tape is used to store the stack. 


3. The third tape is the output tape. Every time a symbol is popped from 
the stack, it must be written on the output tape, following all previously 
written symbols. 


The Turing machine is required to start with an empty stack and implement the 
sequence of push and pop operations, as specified on the input, reading from 
left to right. If the input causes the TM to try to pop and empty stack, then it 
must halt in a special error state qe. If the entire input leaves the stack empty 
at the end, then the input is accepted by going to the final state qf. Describe 
the transition function of the TM informally but clearly. Also, give a summary 
of the purpose of each state you use. 


Exercise 8.4.8: In Fig. 8.17 we saw an example of the general simulation of 
a k-tape TM by a one-tape TM. 


* a) Suppose this technique is used to simulate a 5-tape TM that had a tape 
alphabet of seven symbols. How many tape symbols would the one-tape 
TM have? 


* b) An alternative way to simulate k tapes by one is to use a (k + 1)st track 
to hold the head positions of all k tapes, while the first k tracks simulate 
the k tapes in the obvious manner. Note that in the (k + 1)st track, we 
must be careful to distinguish among the tape heads and to allow for the 
possibility that two or more heads are at the same cell. Does this method 
reduce the number of tape symbols needed for the one-tape TM? 


c) Another way to simulate k tapes by 1 is to avoid storing the head positions 
altogether. Rather, a (k + 1)st track is used only to mark one cell of the 
tape. At all times, each simulated tape is positioned on its track so the 
head is at the marked cell. If the k-tape TM moves the head of tape i, then 
the simulating one-tape TM slides the entire nonblank contents of the ith 
track one cell in the opposite direction, so the marked cell continues to 
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hold the cell scanned by the ith tape head of the k-tape TM. Does this 
method help reduce the number of tape symbols of the one-tape TM? 
Does it have any drawbacks compared with the other methods discussed? 


Exercise 8.4.9: A k-head Turing machine has k heads reading cells of one 
tape. A move of this TM depends on the state and on the symbol scanned 
by each head. In one move, the TM can change state, write a new symbol 
on the cell scanned by each head, and can move each head left, right, or keep 
it stationary. Since several heads may be scanning the same cell, we assume 
the heads are numbered 1 through k, and the symbol written by the highest 
numbered head scanning a given cell is the one that actually gets written there. 
Prove that the languages accepted by k-head Turing machines are the same as 
those accepted by ordinary TM’s. 


Exercise 8.4.10: A two-dimensional Turing machine has the usual finite-state 
control but a tape that is a two-dimensional grid of cells, infinite in all directions. 
The input is placed on one row of the grid, with the head at the left end of the 
input and the control in the start state, as usual. Acceptance is by entering a 
final state, also as usual. Prove that the languages accepted by two-dimensional 
Turing machines are the same as those accepted by ordinary TM’s. 


8.5 Restricted Turing Machines 


We have seen seeming generalizations of the Turing machine that do not add any 
language-recognizing power. Now, we shall consider some examples of apparent 
restrictions on the TM that also give exactly the same language-recognizing 
power. Our first restriction is minor but useful in a number of constructions 
to be seen later: we replace the TM tape that is infinite in both directions by 
a tape that is infinite only to the right. We also forbid this restricted TM to 
print a blank as the replacement tape symbol. The value of these restrictions 
is that we can assume ID’s consist of only nonblank symbols, and that they 
always begin at the left end of the input. 

We then explore certain kinds of multitape Turing machines that are gen- 
eralized pushdown automata. First, we restrict the tapes of the TM to behave 
like stacks. Then, we further restrict the tapes to be “counters,” that is, they 
can only represent one integer, and the TM can only distinguish a count of 0 
from any nonzero count. The impact of this discussion is that there are several 
very simple kinds of automata that have the full power of any computer. More- 
over, undecidable problems about Turing machines, which we see in Chapter 9, 
apply as well to these simple machines. 


8.5.1 Turing Machines With Semi-infinite Tapes 


While we have allowed the tape head of a Turing machine to move either left 
or right from its initial position, it is only necessary that the TM’s head be 
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allowed to move within the positions at and to the right of the initial head 
position. In fact, we can assume the tape is semi-infinite, that is, there are no 
cells to the left of the initial head position. In the next theorem, we shall give a 
construction that shows a TM with a semi-infinite tape can simulate one whose 
tape is, like our original TM model, infinite in both directions. 

The trick behind the construction is to use two tracks on the semi-infinite 
tape. The upper track represents the cells of the original TM that are at or to 
the right of the initial head position. The lower track represents the positions 
left of the initial position, but in reverse order. The exact arrangement is 
suggested in Fig. 8.19. The upper track represents cells Xo, X1,... , where Xo 
is the initial position of the head; X1, X2, and so on, are the cells to its right. 
Cells X_1, X_2, and so on, represent cells to the left of the initial position. 
Notice the * on the leftmost cell’s bottom track. This symbol serves as an 
endmarker and prevents the head of the semi-infinite TM from accidentally 
falling off the left end of the tape. 


Xo [Xi [X2 


ge n.a 


Figure 8.19: A semi-infinite tape can simulate a two-way infinite tape 


We shall make one more restriction to our Turing machine: it never writes a 
blank. This simple restriction, coupled with the restriction that the tape is only 
semi-infinite, means that the tape is at all times a prefix of nonblank symbols 
followed by an infinity of blanks. Further, the sequence of nonblanks always 
begins at the initial tape position. We shall see in Theorem 9.19, and again in 
Theorem 10.9, how useful it is to assume ID’s have this form. 


Theorem 8.12: Every language accepted by a TM Mə is also accepted by a 
TM M; with the following restrictions: 


1. My,’s head never moves left of its initial position. 
2. Mı never writes a blank. 


PROOF: Condition (2) is quite easy. Create a new tape symbol B’ that func- 
tions as a blank, but is not the blank B. That is: 


a) If Mə has a rule 62(q,X) = (p, B,D), change this rule to do(q,X) = 
(p, B', D). 


b) Then, let ô2(q, B’) be the same as 52(q, B), for every state q. 
Condition (1) requires more effort. Let 


Mə = (Q2, £, T2, d2,q2, B, F2) 
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be the TM Mə as modified above, so it never writes the blank B. Construct 


M, me (Q1,™ x {B}, T1, 61, qo, |B, B], Fi) 


where: 


Qı: 


Ty: 


ôi: 


The states of Mı are {q0,q1} U (Q2 x {U, L}). That is, the states of Mı 
are the initial state go another state qı, and all the states of Mə with a 
second data component that is either U or L (upper or lower). The second 
component tells us whether the upper or lower track, as in Fig. 8.19 is 
being scanned by Mə. Put another way, U means the head of Mə is at 
or to the right of its initial position, and L means it is to the left of that 
position. 


The tape symbols of Mı are all pairs of symbols from T»2, that is, Tə x T2. 
The input symbols of Mı are those pairs with an input symbol of Mə in 
the first component and a blank in the second component, that is, pairs 
of the form [a, B], where a is in ©. The blank of M, has blanks in both 
components. Additionally, for every symbol X in Ts, there is a pair [X, *| 
in T4. Here, * is a new symbol, not in F>, and serves to mark the left end 
of M,’s tape. 


The transitions of Mı are as follows: 


1. 61 (qo, la, B]) = (qı, la, *], R), for any a in X. The first move of Mı 
puts the * marker in the lower track of the leftmost cell. The state 
becomes qı, and the head moves right, because it cannot move left 
or remain stationary. 

2. ô&(q,[X, B]) = ((q2,U], [X, B], L), for any X in Tə. In state qı, Mı 
establishes the initial conditions of M2, by returning the head to its 
initial position and changing the state to [q2, U], i.e., the initial state 
of Mə, with attention focused on the upper track of M,. 

3. If d2(q, X) = (p, Y, D), then for every Z in Ty: 

(a) ô (la, U], [X, Z) = ([p, U], [Y, Z], D) and 

(b) ô1ı (lq, £], [Z, X]) = ([p, L),[Z,Y], D), 

where D is the direction opposite D, that is, L if D = R and R if 
D = L. If Mı is not at its leftmost cell, then it simulates Mə on 
the appropriate track — the upper track if the second component. of 
state is U and the lower track if the second component is L. Note, 
however, that when working on the lower track, Mz moves in the 
direction opposite that of Mə. That choice makes sense, because the 
left half of M2’s tape has been folded, in reverse, along the lower 
track of M,’s tape. 


4. If 6o(q, X) = (p, Y, R), then 


ô1 ([q, L], [X, *]) = ô1 (|q, UJ, [X, *]) z ([p, UJ, [Y, x], R) 
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This rule covers one case of how the left endmarker * is handled. If 
Mə moves right from its initial position, then regardless of whether 
it had previously been to the left or the right of that position (as 
reflected in the fact that the second component of M,’s state could 
be L or U), Mı must move right and focus on the upper track. That 
is, Mı will next be at the position represented by Xj in Fig. 8.19. 


5. If do(q, X) = (p, Y, L), then 
ôı (lq, L], [X, *]) = ôı (lq, U], [X, *]) = ([p, L], [Y, *], R) 


This rule is similar to the previous, but covers the case where Mə 
moves left from its initial position. Mı must move right from its 
endmarker, but now focuses on the lower track, i.e., the cell indicated 
by X_, in Fig. 8.19. 


Fı: The accepting states F\ are those states in Fə x {U, L}, that is all states 
of Mı whose first component is an accepting state of Mə. The attention 
of Mı may be focused on either the upper or lower track at the time it 
accepts. 


The proof of the theorem is now essentially complete. We may observe by 
induction on the number of moves made by Mə that Mı will mimic the ID of 
Mə on its own tape, if you take the lower track, reverse it, and follow it by the 
upper track. Also, we note that Mı enters one of its accepting states exactly 
when Mə does. Thus, £(M,) = L(Mg). 


8.5.2 Multistack Machines 


We now consider several computing models that are based on generalizations 
of the pushdown automaton. First, we consider what happens when we give 
the PDA several stacks. We already know, from Example 8.7, that a Turing 
machine can accept languages that are not accepted by any PDA with one 
stack. It turns out that if we give the PDA two stacks, then it can accept any 
language that a TM can accept. 

We shall then consider a class of machines called “counter machines.” These 
machines have only the ability to store a finite number of integers (“counters”), 
and to make different moves depending on which, if any, of the counters are 
currently 0. The counter machine can only add or subtract one from the counter, 
and cannot tell two different nonzero counts from each other. In effect, a counter 
is like a stack on which we can place only two symbols: a bottom-of-stack marker 
that appears only at the bottom, and one other symbol that may be pushed 
and popped from the stack. 

We shall not give a formal treatment of the multistack machine, but the 
idea is suggested by Fig. 8.20. A k-stack machine is a deterministic PDA with 
k stacks. It obtains its input, like the PDA does, from an input source, rather 
than having the input placed on a tape or stack, as the TM does. The multistack 
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Finite 
Input ——> state }—» Accept/reject 
control 


yA 


Figure 8.20: A machine with three stacks 


machine has a finite control, which is in one of a finite set of states. It has a 
finite stack alphabet, which it uses for all its stacks. A move of the multistack 
machine is based on: 


1. The state of the finite control. 


2. The input symbol read, which is chosen from the finite input alphabet. 
Alternatively, the multistack machine can make a move using € input, but 
to make the machine deterministic, there cannot be a choice of an e-move 
or a non-e-move in any situation. 


3. The top stack symbol on each of its stacks. 
In one move, the multistack machine can: 


a) Change to a new state. 


b) Replace the top symbol of each stack with a string of zero or more stack 
symbols. There can be (and usually is) a different replacement string for 
each stack. 

Thus, a typical transition rule for a k-stack machine looks like: 


6(q,a,X1, Xo,...,Xx) = (P, Y1, V25 ++ -3 Yk) 


The interpretation of this rule is that in state q, with X; on top of the ith stack, 


for i = 1,2,...,k, the machine may consume a (either an input symbol or e) 
from its input, go to state p, and replace X; on top of the ith stack by string 
Ji, for each i = 1,2,...,k. The multistack machine accepts by entering a final 
state. 


We add one capability that simplifies input processing by this deterministic 
machine: we assume there is a special symbol $, called the endmarker, that 
appears only at the end of the input and is not part of that input. The presence 
of the endmarker allows us to know when we have consumed all the available 
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input. We shall see in the next theorem how the endmarker makes it easy for the 
multistack machine to simulate a Turing machine. Notice that the conventional 
TM needs no special endmarker, because the first blank serves to mark the end 
of the input. 


Theorem 8.13: If a language L is accepted by a Turing machine, then L is 
accepted by a two-stack machine. 


PROOF: The essential idea is that two stacks can simulate one Turing-machine 
tape, with one stack holding what is to the left of the head and the other stack 
holding what is to the right of the head, except for the infinite strings of blanks 
beyond the leftmost and rightmost nonblanks. In more detail, let L be L(M) 
for some (one-tape) TM M. Our two-stack machine S will do the following: 


1. S begins with a bottom-of-stack marker on each stack. This marker can 
be the start symbol for the stacks, and must not appear elsewhere on the 
stacks. In what follows, we shall say that a “stack is empty” when it 
contains only the bottom-of-stack marker. 


2. Suppose that w$ is on the input of S. S copies w onto its first stack, 
ceasing to copy when it reads the endmarker on the input. 


3. S pops each symbol in turn from its first stack and pushes it onto its 
second stack. Now, the first stack is empty, and the second stack holds 
w, with the left end of w at the top. 


4. S enters the (simulated) start state of M. It has an empty first stack, 
representing the fact that M has nothing but blanks to the left of the cell 
scanned by its tape head. S has a second stack holding w, representing 
the fact that w appears at and to the right of the cell scanned by M’s 
head. 


5. S simulates a move of M as follows. 


(a) S knows the state of M, say q, because S simulates the state of M 
in its own finite control. 


(b) S knows the symbol X scanned by M’s tape head; it is the top 
of S’s second stack. As an exception, if the second stack has only 
the bottom-of-stack marker, then M has just moved to a blank; S 
interprets the symbol scanned by M as the blank. 


(c) Thus, S knows the next move of M. 


(d) The next state of M is recorded in a component of S’s finite control, 
in place of the previous state. 


(e) If M replaces X by Y and moves right, then S pushes Y onto its 
first stack, representing the fact that Y is now to the left of M’s 
head. X is popped off the second stack of S. However, there are two 
exceptions: 
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i. If the second stack has only a bottom-of-stack marker (and there- 
fore, X is the blank), then the second stack is not changed; M 
has moved to yet another blank further to the right. 


ii. If Y is blank, and the first stack is empty, then that stack remains 
empty. The reason is that there are still only blanks to the left 
of M’s head. 


If M replaces X by Y and moves left, S pops the top of the first 
stack, say Z, then replaces X by ZY on the second stack. This 
change reflects the fact that what used to be one position left of the 
head is now at the head. As an exception, if Z is the bottom-of-stack 
marker, then M must push BY onto the second stack and not pop 
the first stack. 


6. S accepts if the new state of M is accepting. Otherwise, S simulates 
another move of M in the same way. 


8.5.3 Counter Machines 


A counter machine may be thought of in one of two ways: 


1. The counter machine has the same structure as the multistack machine 
(Fig. 8.20), but in place of each stack is a counter. Counters hold any 
nonnegative integer, but we can only distinguish between zero and nonzero 
counters. That is, the move of the counter machine depends on its state, 
input symbol, and which, if any, of the counters are zero. In one move, 
the counter machine can: 


(a) 
(b) 


Change state. 


Add or subtract 1 from any of its counters, independently. However, 
a counter is not allowed to become negative, so it cannot subtract 1 
from a counter that is currently 0. 


2. A counter machine may also be regarded as a restricted multistack ma- 
chine. The restrictions are as follows: 


There are only two stack symbols, which we shall refer to as Zo (the 
bottom-of-stack marker), and X. 


Zo is initially on each stack. 
We may replace Zo only by a string of the form Xt Zo, for some i > 0. 


We may replace X only by X‘ for some i > 0. That is, Zp appears 
only on the bottom of each stack, and all other stack symbols, if any, 
are X. 
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We shall use definition (1) for counter machines, but the two definitions clearly 
define machines of equivalent power. The reason is that stack X’Z can be 
identified with the count i. In definition (2), we can tell count 0 from other 
counts, because for count 0 we see Zo on top of the stack, and otherwise we see 
X. However, we cannot distinguish two positive counts, since both have X on 
top of the stack. 


8.5.4 The Power of Counter Machines 


There are a few observations about the languages accepted by counter machines 
that are obvious but worth stating: 


e Every language accepted by a counter machine is recursively enumerable. 
The reason is that a counter machine is a special case of a stack machine, 
and a stack machine is a special case of a multitape Turing machine, which 
accepts only recursively enumerable languages by Theorem 8.9. 


e Every language accepted by a one-counter machine is a CFL. Note that 
a counter, in point-of-view (2), is a stack, so a one-counter machine is a 
special case of a one-stack machine, i.e., a PDA. In fact, the languages 
of one-counter machines are accepted by deterministic PDA’s, although 
the proof is surprisingly complex. The difficulty in the proof stems from 
the fact that the multistack and counter machines have an endmarker $ 
at the end of their input. A nondeterministic PDA can guess that it has 
seen the last input symbol and is about to see the $; thus it is clear that a 
nondeterministic PDA without the endmarker can simulate a DPDA with 
the endmarker. However, the hard proof, which we shall not attack, is 
to show that a DPDA without the endmarker can simulate a DPDA with 
the endmarker. 


The surprising result about counter machines is that two counters are enough to 
simulate a Turing machine and therefore to accept every recursively enumerable 
language. It is this result we address now, first showing that three counters are 
enough, and then simulating three counters by two counters. 


Theorem 8.14: Every recursively enumerable language is accepted by a three- 
counter machine. 


PROOF: Begin with Theorem 8.13, which says that every recursively enumer- 
able language is accepted by a two-stack machine. We then need to show how 
to simulate a stack with counters. Suppose there are r — 1 tape symbols used 
by the stack machine. We may identify the symbols with the digits 1 through 
r — 1, and think of a stack X1 Xə -:- Xn as an integer in base r. That is, this 
stack (whose top is at the left end, as usual) is represented by the integer 
Xart + Xnr? +e + Kor + X. 

We use two counters to hold the integers that represent each of the two 
stacks. The third counter is used to adjust the other two counters. In particular, 
we need the third counter when we either divide or multiply a count by r. 
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The operations on a stack can be broken into three kinds: pop the top 
symbol, change the top symbol, and push a symbol onto the stack. A move of 
the two-stack machine may involve several of these operations; in particular, 
replacing the top stack symbol X by a string of symbols must be broken down 
into replacing X and then pushing additional symbols onto the stack. We 
perform these operations on a stack that is represented by a count 7, as follows. 
Note that it is possible to use the finite control of the multistack machine to do 
each of the operations that requires counting up to r or less. 


1. To pop the stack, we must replace i by i/r, throwing away any remainder, 
which is X,. Starting with the third counter at 0, we repeatedly reduce 
the count i by r, and increase the third counter by 1. When the counter 
that originally held i reaches 0, we stop. Then, we repeatedly increase the 
original counter by 1 and decrease the third counter by 1, until the third 
counter becomes 0 again. At this time, the counter that used to hold i 
holds i/r. 


2. To change X to Y on the top of a stack that is represented by count i, 
we increment or decrement i by a small amount, surely no more than r. 
If Y > X, as digits, increment i by Y — X; if Y < X then decrement i by 
xX -Y. 


3. To push X onto a stack that initially holds i, we need to replace i by 
ir+ X. We first multiply by r. To do so, repeatedly decrement the count 
i by 1 and increase the third counter (which starts from 0, as always), by 
r. When the original counter becomes 0, we have ir on the third counter. 
Copy the third counter to the original counter and make the third counter 
0 again, as we did in item (1). Finally, we increment the original counter 
by X. 


To complete the construction, we must initialize the counters to simulate the 
stacks in their initial condition: holding only the start symbol of the two-stack 
machine. This step is accomplished by incrementing the two counters involved 
to some small integer, whichever integer from 1 to r —1 corresponds to the start 
symbol. 


Theorem 8.15: Every recursively enumerable language is accepted by a two- 
counter machine. 


PROOF: With the previous theorem, we only have to show how to simulate 
three counters with two counters. The idea is to represent the three counters, 
say i, j, and k, by a single integer. The integer we choose is m = 2'3/5*. One 
counter will hold this number, while the other is used to help multiply or divide 
m by one of the first three primes: 2, 3, and 5. To simulate the three-counter 
machine, we need to perform the following operations: 


1. Increment i, j, and/or k. To increment i by 1, we multiply m by 2. 
We already saw in the proof of Theorem 8.14 how to multiply a count 
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Choice of Constants in the 3-to-2 Counter 
Construction 


Notice how important it is in the proof of Theorem 8.15 2, 3, and 5 are 


distinct primes. If we had chosen, say m = 2'3/4*, then m = 12 could 
represent either i = 0, j = 1, and k = 1, or it could represent i = 2, 7 = 1, 
and k = 0. Thus, we could not tell whether i or k was 0, and thus could 
not simulate the 3-counter machine reliably. 


by any constant r, using a second counter. Likewise, we increment j by 
multiplying m by 3, and we increment k by multiplying m by 5. 


2. Tell which, if any, of i, j, and k are 0. To tell if i = 0, we must determine 
whether m is divisible by 2. Copy m into the second counter, using the 
state of the counter machine to remember whether we have decremented 
m an even or odd number of times. If we have decremented m an odd 
number of times when it becomes 0, then i = 0. We then restore m 
by copying the second counter to the first. Similarly, we test if 7 = 0 
by determining whether m is divisible by 3, and we test if k = 0 by 
determining whether m is divisible by 5. 


3. Decrement i, j, and/or k. To do so, we divide m by 2, 3, or 5, respec- 
tively. The proof of Theorem 8.14 tells us how to perform the division by 
any constant, using an extra counter. Since the 3-counter machine cannot 
decrease a count below 0, it is an error, and the simulating 2-counter ma- 
chine halts without accepting, if m is not evenly divisible by the constant 
by which we are dividing. 


8.5.5 Exercises for Section 8.5 


Exercise 8.5.1: Informally but clearly describe counter machines that accept 
the following languages. In each case, use as few counters as possible, but not 
more than two counters. 


* a) {001™ |n >m > 1}. 
b) {0O"1" |m >n > 1}. 


*! c) {abi |i=jori= k}. 


1! d) faible’ | i=j ori =k or j= k}. 
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Exercise 8.5.2: The purpose of this exercise is to show that a one-stack ma- 
chine with an endmarker on the input has no more power than a deterministic 
PDA. L$ is the concatenation of language L with the language containing only 
the one string $; that is, L$ is the set of all strings w$ such that w is in L. Show 
that if L$ is a language accepted by a DPDA, where $ is the endmarker symbol, 
not appearing in any string of L, then L is also accepted by some DPDA. Hint: 
This question is really one of showing that the DPDA languages are closed un- 
der the operation L/a defined in Exercise 4.2.2. You must modify the DPDA 
P for L$ by replacing each of its stack symbols X by all possible pairs (X, S), 
where S is a set of states. If P has stack X,Xo---Xpn, then the constructed 
DPDA for L has stack (X1, S1)(X2, S2) +- (Xn, Sn), where each S; is the set of 
states q such that P, started in ID (q, a, X;Xi41--+Xn) will accept. 


8.6 Turing Machines and Computers 


Now, let us compare the Turing machine and the common sort of computer 
that we use daily. While these models appear rather different, they can accept 
exactly the same languages — the recursively enumerable languages. Since 
the notion of “a common computer” is not well defined mathematically, the 
arguments in this section are necessarily informal. We must appeal to your 
intuition about what computers can do, especially when the numbers involved 
exceed normal limits that are built into the architecture of these machines (e.g., 
32-bit address spaces). The claims of this section can be divided into two parts: 


1. A computer can simulate a Turing machine. 


2. A Turing machine can simulate a computer, and can do so in an amount 
of time that is at most some polynomial in the number of steps taken by 
the computer. 


8.6.1 Simulating a Turing Machine by Computer 


Let us first examine how a computer can simulate a Turing machine. Given 
a particular TM M, we must write a program that acts like M. One aspect 
of M is its finite control. Since there are only a finite number of states and a 
finite number of transition rules, our program can encode states as character 
strings and use a table of transitions, which it looks up to determine each move. 
Likewise, the tape symbols can be encoded as character strings of a fixed length, 
since there are only a finite number of tape symbols. 

A serious question arises when we consider how our program is to simulate 
the Turing-machine tape. This tape can grow infinitely long, but the computer’s 
memory — main memory, disk, and other storage devices — are finite. Can we 
simulate an infinite tape with a fixed amount of memory? 

If there is no opportunity to replace storage devices, then in fact we cannot; 
a computer would then be a finite automaton, and the only languages it could 
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accept would be regular. However, common computers have swappable storage 
devices, perhaps a “Zip” disk, for example. In fact, the typical hard disk is 
removable and can be replaced by an empty, but otherwise identical disk. 

Since there is no obvious limit on how many disks we could use, let us assume 
that as many disks as the computer needs is available. We can thus arrange 
that the disks are placed in two stacks, as suggested by Fig. 8.21. One stack 
holds the data in cells of the Turing-machine tape that are located significantly 
to the left of the tape head, and the other stack holds data significantly to the 
right of the tape head. The further down the stacks, the further away from the 
tape head the data is. 
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Figure 8.21: Simulating a Turing machine with a common computer 


If the tape head of the TM moves sufficiently far to the left that it reaches 
cells that are not represented by the disk currently mounted in the computer, 
then it prints a message “swap left.” The currently mounted disk is removed 
by a human operator and placed on the top of the right stack. The disk on top 
of the left stack is mounted in the computer, and computation resumes. 

Similarly, if the TM’s tape head reaches cells so far to the right that these 
cells are not represented by the mounted disk, then a “swap right” message is 
printed. The human operator moves the currently mounted disk to the top of 
the left stack, and mounts the disk on top of the right stack in the computer. 
If either stack is empty when the computer asks that a disk from that stack 
be mounted, then the TM has entered an all-blank region of the tape. In that 
case, the human operator must go to the store and buy a fresh disk to mount. 


8.6.2 Simulating a Computer by a Turing Machine 


We also need to consider the opposite comparison: are there things a common 
computer can do that a Turing machine cannot. An important subordinate 
question is whether the computer can do certain things much faster than a 
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The Problem of Very Large Tape Alphabets 


The argument of Section 8.6.1 becomes questionable if the number of tape 
symbols is so large that the code for one tape symbol doesn’t fit on a disk. 
There would have to be very many tape symbols indeed, since a 30 gigabyte 
disk, for instance, can represent any of 2740000000000 symbols. Likewise, the 
number of states could be so large that we could not represent the state 
using the entire disk. 

One resolution of this problem begins by limiting the number of tape 
symbols a TM uses. We can always encode an arbitrary tape alphabet in 
binary. Thus, any TM M can be simulated by another TM MM’ that uses 
only tape symbols 0, 1, and B. However, M’ needs many states, since to 
simulate a move of M, the TM M' must scan its tape and remember, in its 
finite control, all the bits that tell it what symbol M is scanning. In this 
manner, we are left with very large state sets, and the PC that simulates 
M' may have to mount and dismount several disks when deciding what 
the state of M' is and what the next move of M’ should be. No one ever 
thinks about computers performing tasks of this nature, so the typical 
operating system has no support for a program of this type. However, if 
we wished, we could program the raw computer and give it this capability. 

Fortunately, the question of how to simulate a TM with a huge number 
of states or tape symbols can be finessed. We shall see in Section 9.2.3 
that one can design a TM that is in effect a “stored program” TM. This 
TM, called “universal,” takes the transition function of any TM, encoded 
in binary on its tape, and simulates that TM. The universal TM has 
quite reasonable numbers of states and tape symbols. By simulating the 
universal TM, a common computer can be programmed to accept any 
recursively enumerable language that we wish, without having to resort 
to simulation of numbers of states that stress the limits of what can be 
stored on a disk. 


Turing machine. In this section, we argue that a TM can simulate a computer, 
and in Section 8.6.3 we argue that the simulation can be done sufficiently fast 
that “only” a polynomial separates the running times of the computer and TM 
on a given problem. Again, let us remind the reader that there are impor- 
tant reasons to think of all running times that lie within a polynomial of one 
another to be similar, while exponential differences in running time are “too 
much.” We take up the theory of polynomial versus exponential running times 
in Chapter 10. 

To begin our study of how a TM simulates a computer, let us give a realistic 
but informal model of how a typical computer operates. 


a) First, we shall suppose that the storage of a computer consists of an indef- 
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initely long sequence of words, each with an address. In a real computer, 
words might be 32 or 64 bits long, but we shall not put a limit on the 
length of a given word. Addresses will be assumed to be integers 0, 1, 
2, and so on. In a real computer, individual bytes would be numbered 
by consecutive integers, so words would have addresses that are multiples 
of 4 or 8, but this difference is unimportant. Also, in a real computer, 
there would be a limit on the number of words in “memory,” but since we 
want to account for the content of an arbitrary number of disks or other 
storage devices, we shall assume there is no limit to the number of words. 


b) We assume that the program of the computer is stored in some of the 
words of memory. These words each represent a simple instruction, as in 
the machine or assembly language of a typical computer. Examples are 
instructions that move data from one word to another or that add one 
word to another. We assume that “indirect addressing” is permitted, so 
one instruction could refer to another word and use the contents of that 
word as the address of the word to which the operation is applied. This 
capability, found in all modern computers, is needed to perform array 
accesses, to follow links in a list, or to do pointer operations in general. 


c) We assume that each instruction involves a limited (finite) number of 
words, and that each instruction changes the value of at most one word. 


d) A typical computer has registers, which are memory words with especially 
fast access. Often, operations such as addition are restricted to occur in 
registers. We shall not make any such restrictions, but will allow any 
operation to be performed on any word. The relative speed of operations 
on different words will not be taken into account, nor need it be if we 
are only comparing the language-recognizing abilities of computers and 
Turing machines. Even if we are interested in running time to within a 
polynomial, the relative speeds of different word accesses is unimportant, 
since those differences are “only” a constant factor. 


Figure 8.22 suggests how the Turing machine would be designed to simulate 
acomputer. This TM uses several tapes, but it could be converted to a one-tape 
TM using the construction of Section 8.4.1. The first tape represents the entire 
memory of the computer. We have used a code in which addresses of memory 
words, in numerical order, alternate with the contents of those memory words. 
Both addresses and contents are written in binary. The marker symbols * and 
# are used to make it easy to find the ends of addresses and contents, and to tell 
whether a binary string is an address or contents. Another marker, $, indicates 
the beginning of the sequence of addresses and contents. 

The second tape is the “instruction counter.” This tape holds one integer 
in binary, which represents one of the memory locations on tape 1. The value 
stored in this location will be interpreted as the next computer instruction to 
be executed. 
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Figure 8.22: A Turing machine that simulates a typical computer 


The third tape holds a “memory address” or the contents of that address 
after the address has been located on tape 1. To execute an instruction, the 
TM must find the contents of one or more memory addresses that hold data 
involved in the computation. First, the desired address is copied onto tape 3 and 
compared with the addresses on tape 1, until a match is found. The contents of 
this address is copied onto the third tape and moved to wherever it is needed, 
typically to one of the low-numbered addresses that represent the registers of 
the computer. 

Our TM will simulate the instruction cycle of the computer, as follows. 


1. Search the first tape for an address that matches the instruction number 
on tape 2. We start at the $ on the first tape, and move right, comparing 
each address with the contents of tape 2. The comparison of addresses 
on the two tapes is easy, since we need only move the tape heads right, 
in tandem, checking that the symbols scanned are always the same. 


2. When the instruction address is found, examine its value. Let us assume 
that when a word is an instruction, its first few bits represent the action 
to be taken (e.g., copy, add, branch), and the remaining bits code an 
address or addresses that are involved in the action. 


3. If the instruction requires the value of some address, then that address 
will be part of the instruction. Copy that address onto the third tape, and 
mark the position of the instruction, using a second track of the first tape 
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(not shown in Fig. 8.22), so we can find our way back to the instruction, 
if necessary. Now, search for the memory address on the first tape, and 
copy its value onto tape 3, the tape that holds the memory address. 


4. Execute the instruction, or the part of the instruction involving this value. 
We cannot go into all the possible machine instructions. However, a 
sample of the kinds of things we might do with the new value are: 


(a) 


(c) 


Copy it to some other address. We get the second address from the 
instruction, find this address by putting it on tape 3 and searching 
for the address on tape 1, as discussed previously. When we find 
the second address, we copy the value into the space reserved for the 
value of that address. If more space is needed for the new value, or 
the new value uses less space than the old value, change the available 
space by shifting over. That is: 


i. Copy, onto a scratch tape, the entire nonblank tape to the right 
of where the new value goes. 
ii. Write the new value, using the correct amount of space for that 
value. 
iii. Recopy the scratch tape onto tape 1, immediately to the right 
of the new value. 


As a special case, the address may not yet appear on the first tape, 
because it has not been used by the computer previously. In this 
case, we find the place on the first tape where it belongs, shift-over 
to make adequate room, and store both the address and the new 
value there. 


Add the value just found to the value of some other address. Go back 
to the instruction to locate the other address. Find this address on 
tape 1. Perform a binary addition of the value of that address and the 
value stored on tape 3. By scanning the two values from their right 
ends, a TM can perform a ripple-carry addition with little difficulty. 
Should more space be needed for the result, use the shifting-over 
technique to create space on tape 1. 

The instruction is a “jump,” that is, a directive to take the next 
instruction from the address that is the value now stored on tape 3. 
Simply copy tape 3 to tape 2 and begin the instruction cycle again. 


5. After performing the instruction, and determining that the instruction is 
not a jump, add 1 to the instruction counter on tape 2 and begin the 
instruction cycle again. 


There are many other details of how the TM simulates a typical computer. 
We have suggested in Fig. 8.22 a fourth tape holding the simulated input to the 
computer, since the computer must read its input (the word whose membership 
in a language it is testing) from a file. The TM can read from this tape instead. 
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A scratch tape is also shown. Simulation of some computer instructions 
might make effective use of a scratch tape or tapes to compute arithmetic 
operations such as multiplication. 

Finally, we assume that the computer makes an output that tells whether 
or not its input is accepted. To translate this action into terms that the Turing 
machine can execute, we shall suppose that there is an “accept” instruction of 
the computer, perhaps corresponding to a function call by the computer to put 
yes on an output file. When the TM simulates the execution of this computer 
instruction, it enters an accepting state of its own and halts. 

While the above discussion is far from a complete, formal proof that a TM 
can simulate a typical computer, it should provide you with enough detail to 
convince you that a TM is a valid representation for what a computer can 
do. Thus, in the future, we shall use only the Turing machine as the formal 
representation of what can be computed by any kind of computing device. 


8.6.3 Comparing the Running Times of Computers and 
Turing Machines 


We now must address the issue of running time for the Turing machine that 
simulates a computer. As we have suggested previously: 


e The issue of running time is important because we shall use the TM not 
only to examine the question of what can be computed at all, but what 
can be computed with enough efficiency that a problem’s computer-based 
solution can be used in practice. 


e The dividing line separating the tractable — that which can be solved 
efficiently — from the intractable — problems that can be solved, but not 
fast enough for the solution to be usable — is generally held to be between 
what can be computed in polynomial time and what requires more than 
any polynomial running time. 


e Thus, we need to assure ourselves that if a problem can be solved in poly- 
nomial time on a typical computer, then it can be solved in polynomial 
time by a Turing machine, and conversely. Because of this polynomial 
equivalence, our conclusions about what a Turing machine can or cannot 
do with adequate efficiency apply equally well to a computer. 


Recall that in Section 8.4.3 we determined that the difference in running 
time between one-tape and multitape TM’s was polynomial — quadratic, in 
particular. Thus, it is sufficient to show that anything the computer can do, 
the multitape TM described in Section 8.6.2 can do in an amount of time that 
is polynomial in the amount of time the computer takes. We then know that 
the same holds for a one-tape TM. 

Before giving the proof that the Turing machine described above can sim- 
ulate n steps of a computer in O(n?) time, we need to confront the issue of 
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multiplication as a computer instruction. The problem is that we have not put 
a limit on the number of bits that one computer word can hold. If, say, the 
computer were to start with a word holding integer 2, and were to multiply that 
word by itself for n consecutive steps, then the word would hold the number 
22”. This number requires 2” + 1 bits to represent, so the time the Turing 
machine takes to simulate these n instructions would be exponential in n, at 
least. 

One approach is to insist that words retain a fixed maximum length, say 
64 bits. Then, multiplications (or other operations) that produced a word too 
long would cause the computer to halt, and the Turing machine would not have 
to simulate it any further. We shall take a more liberal stance: the computer 
may use words that grow to any length, but one computer instruction can only 
produce a word that is one bit longer than the longer of its arguments. 


Example 8.16: Under the above restriction, addition is allowed, since the 
result can only be one bit longer than the maximum length of the addends. 
Multiplication is not allowed, since two m-bit words can have a product of 
length 2m. However, we can simulate a multiplication of m-bit integers by a 
sequence of m additions, interspersed with shifts of the multiplicand one bit 
left (which is another operation that only increases the length of the word by 
1). Thus, we can still multiply arbitrarily long words, but the time taken by 
the computer is proportional to the square of the length of the operands. 


Assuming one-bit maximum growth per computer instruction executed, we 
can prove our polynomial relationship between the two running times. The 
idea of the proof is to notice that after n instructions have been executed, the 
number of words mentioned on the memory tape of the TM is O(n), and each 
computer word requires O(n) Turing-machine cells to represent it. Thus, the 
tape is O(n) cells long, and the TM can locate the finite number of words 
needed by one computer instruction in O(n”) time. 

There is, however, one additional requirement that must be placed on the 
instructions. Even if the instruction does not produce a long word as a result, 
it could take a great deal of time to compute the result. We therefore make the 
additional assumption that the instruction itself, applied to words of length up 
to k, can be performed in O(k?) steps by a multitape Turing machine. Surely 
the typical computer operations, such as addition, shifting, and comparison of 
values, can be done in O(k) steps of a multitape TM, so we are being overly 
liberal in what we allow a computer to do in one instruction. 


Theorem 8.17: If a computer: 


1. Has only instructions that increase the maximum word length by at most 
1, and 


2. Has only instructions that a multitape TM can perform on words of length 
k in O(k?) steps or less, 
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then the Turing machine described in Section 8.6.2 can simulate n steps of the 
computer in O(n?) of its own steps. 


PROOF: Begin by noticing that the first (memory) tape of the TM in Fig. 8.22 
starts with only the computer’s program. That program may be long, but it is 
fixed and of constant length, independent of n, the number of instruction steps 
the computer executes. Thus, there is some constant c that is the largest of 
the computer’s words and addresses appearing in the program. There is also a 
constant d that is the number of words occupied by the program. 

Thus, after executing n steps, the computer cannot have created any words 
longer than c+ n, and therefore, it cannot have created or used any addresses 
that are longer than c+n bits either. Each instruction creates at most one new 
address that gets a value, so the total number of addresses after n instructions 
have been executed is at most d+ n. Since each address-word combination 
requires at most 2(c +n) + 2 bits, including the address, the contents, and two 
marker symbols to separate them, the total number of TM tape cells occupied 
after n instructions have been simulated is at most 2(d +n)(c +n +1). Asc 
and d are constants, this number of cells is O(n”). 

We now know that each of the fixed number of lookups of addresses involved 
in one computer instruction can be done in O(n?) time. Since words are O(n) 
in length, our second assumption tells us that the instructions themselves can 
each be carried out by a TM in O(n?) time. The only significant, remaining 
cost of an instruction is the time it takes the TM to create more space on its 
tape to hold a new or expanded word. However, shifting-over involves copying 
at most O(n?) data from tape 1 to the scratch tape and back again. Thus, 
shifting-over also requires only O(n?) time per computer instruction. 

We conclude that the TM simulates one step of the computer in O(n?) of 
its own steps. Thus, as we claimed in the theorem statement, n steps of the 
computer can be simulated in O(n?) steps of the Turing machine. 


As a final observation, we now see that cubing the number of steps lets a 
multitape TM simulate a computer. We also know from Section 8.4.3 that a 
one-tape TM can simulate a multitape TM by squaring the number of steps, at 
most. Thus: 


Theorem 8.18: A computer of the type described in Theorem 8.17 can be 
simulated for n steps by a one-tape Turing machine, using at most O(n®) steps 
of the Turing machine. 


8.7 Summary of Chapter 8 


+ The Turing Machine: The TM is an abstract computing machine with 
the power of both real computers and of other mathematical definitions 
of what can be computed. The TM consists of a finite-state control and 
an infinite tape divided into cells. Each cell holds one of a finite number 
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of tape symbols, and one cell is the current position of the tape head. The 
TM makes moves based on its current state and the tape symbol at the 
cell scanned by the tape head. In one move, it changes state, overwrites 
the scanned cell with some tape symbol, and moves the head one cell left 
or right. 


Acceptance by a Turing Machine: The TM starts with its input, a finite- 
length string of tape symbols, on its tape, and the rest of the tape contain- 
ing the blank symbol on each cell. The blank is one of the tape symbols, 
and the input is chosen from a subset of the tape symbols, not including 
blank, called the input symbols. The TM accepts its input if it ever enters 
an accepting state. 


Recursively Enumerable Languages: The languages accepted by TM’s are 
called recursively enumerable (RE) languages. Thus, the RE languages 
are those languages that can be recognized or accepted by any sort of 
computing device. 


Instantaneous Descriptions of a TM: We can describe the current config- 
uration of a TM by a finite-length string that includes all the tape cells 
from the leftmost to the rightmost nonblank. The state and the position 
of the head are shown by placing the state within the sequence of tape 
symbols, just to the left of the cell scanned. 


Storage in the Finite Control: Sometimes, it helps to design a TM for a 
particular language if we imagine that the state has two or more compo- 
nents. One component is the control component, and functions as a state 
normally does. The other components hold data that the TM needs to 
remember. 


Multiple Tracks: It also helps frequently if we think of the tape symbols 
as vectors with a fixed number of components. We may visualize each 
component as a separate track of the tape. 


Multitape Turing Machines: An extended TM model has some fixed num- 
ber of tapes greater than one. A move of this TM is based on the state 
and on the vector of symbols scanned by the head on each of the tapes. 
In a move, the multitape TM changes state, overwrites symbols on the 
cells scanned by each of its tape heads, and moves any or all of its tape 
heads one cell in either direction. Although able to recognize certain 
languages faster than the conventional one-tape TM, the multitape TM 
cannot recognize any language that is not RE. 


Nondeterministic Turing Machines: The NTM has a finite number of 
choices of next move (state, new symbol, and head move) for each state 
and symbol scanned. It accepts an input if any sequence of choices leads 
to an ID with an accepting state. Although seemingly more powerful than 
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the deterministic TM, the NTM is not able to recognize any language that 
is not RE. 


Semi-infinite- Tape Turing Machines: We can restrict a TM to have a tape 
that is infinite only to the right, with no cells to the left of the initial head 
position. Such a TM can accept any RE language. 


Multistack Machines: We can restrict the tapes of a multitape TM to 
behave like a stack. The input is on a separate tape, which is read once 
from left-to-right, mimicking the input mode for a finite automaton or 
PDA. A one-stack machine is really a DPDA, while a machine with two 
stacks can accept any RE language. 


Counter Machines: We may further restrict the stacks of a multistack 
machine to have only one symbol other than a bottom-marker. Thus, 
each stack functions as a counter, allowing us to store a nonnegative 
integer, and to test whether the integer stored is 0, but nothing more. A 
machine with two counters is sufficient to accept any RE language. 


Simulating a Turing Machine by a real computer: It is possible, in prin- 
ciple, to simulate a TM by a real computer if we accept that there is a 
potentially infinite supply of a removable storage device such as a disk, 
to simulate the nonblank portion of the TM tape. Since the physical 
resources to make disks are not infinite, this argument is questionable. 
However, since the limits on how much storage exists in the universe are 
unknown and undoubtedly vast, the assumption of an infinite resource, 
as in the TM tape, is realistic in practice and generally accepted. 


Simulating a Computer by a Turing Machine: A TM can simulate the 
storage and control of a real computer by using one tape to store all the 
locations and their contents: registers, main memory, disks, and other 
storage devices. Thus, we can be confident that something not doable by 
a TM cannot be done by a real computer. 


8.8 Gradiance Problems for Chapter 8 


The following is a sample of problems that are available on-line through the 
Gradiance system at www.gradiance.com/pearson. Each of these problems 
is worked like conventional homework. The Gradiance system gives you four 
choices that sample your knowledge of the solution. If you make the wrong 
choice, you are given a hint or advice and encouraged to try the same problem 
again. 


Problem 8.1: A nondeterministic Turing machine M with start state qo and 
accepting state qf has the following transition function: 
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6(q, a) 0 1 B 
qo | {(m,0, R)} {(m1,0, R)} {(m1,0, R)} 
qı {(q1, 1, R), (@2,0, L)} ee (q2,1,L)} {(m,1,R), (@, B, L)} 
q2 {(af, 0, R {(@,1, {} 
qg |0 {} {} 


Deduce what M does on any input of 0’s and 1’s. Demonstrate your under- 
standing by identifying, from the list below, the ID that cannot be reached on 
some number of moves from the initial ID X [shown on-line by the Gradiance 
system]. 


Problem 8.2: For the Turing machine in Problem 8.1, simulate all sequences 
of 5 moves, starting from initial ID qj1010. Find, in the list below, one of the 
ID’s reachable from the initial ID in exactly 5 moves. 
Problem 8.3: The Turing machine M has: 

1. States q and p; q is the start state. 

2. Tape symbols 0, 1, and B; 0 and 1 are input symbols, and B is the blank. 


3. The next-move function in Fig. 8.23. 


Your problem is to describe the property of an input string that makes M halt. 
Identify a string that makes M halt from the list below. 


State | Tape Symbol Move 


(q, 0, R) 
(p, 0, R) 
(q, B, R) 
(q,0, L) 
none (halt) 
(q,0, L) 


Ss zB, QR 
y.oe 


Figure 8.23: A Turing machine 


Problem 8.4: Simulate the Turing machine M of Fig. 8.23 on the input 
1010110, and identify one of the ID’s (instantaneous descriptions) of M from 
the list below. 


Problem 8.5: A Turing machine M with start state qo and accepting state 
qr has the following transition function: 
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ô(q,a) 0 1 B 


qo (qo, 1, R) (qi, 1, R) (qr, B, R) 
qı (q2,0, L) (q2, 1, L) (q2, B, L) 
q2 E (qo, 0, R) 4 
qf = = = 


Deduce what M does on any input of 0’s and 1’s. Hint: consider what happens 
when M is started in state go at the left end of a sequence of any number 
of 0’s (including zero of them) and a 1. Demonstrate your understanding by 
identifying the true transition of M from the list below. 
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Chapter 9 


Undecidability 


This chapter begins by repeating, in the context of Turing machines, the ar- 
gument of Section 8.1, which was a plausibility argument for the existence of 
problems that could not be solved by computer. The problem with the latter 
“proof” was that we were forced to ignore the real limitations that every imple- 
mentation of C (or any other programming language) has on any real computer. 
Yet these limitations, such as the size of the address space, are not fundamental 
limits. Rather, as the years progress we expect computers will grow indefinitely 
in measures such as address-space size, main-memory size, and others. 

By focusing on the Turing machine, where these limitations do not exist, 
we are better able to capture the essential idea of what some computing device 
will be capable of doing, if not today, then at some time in the future. In this 
chapter, we shall give a formal proof of the existence of a problem about Turing 
machines that no Turing machine can solve. Since we know from Section 8.6 
that Turing machines can simulate real computers, even those without the limits 
that we know exist today, we shall have a rigorous argument that the following 
problem: 


e Does this Turing machine accept (the code for) itself as input? 


cannot be solved by a computer, no matter how generously we relax those 
practical limits. 

We then divide problems that can be solved by a Turing machine into two 
classes: those that have an algorithm (i.e., a Turing machine that halts whether 
or not it accepts its input), and those that are only solved by Turing machines 
that may run forever on inputs they do not accept. The latter form of accep- 
tance is problematic, since no matter how long the TM runs, we cannot know 
whether the input is accepted or not. Thus, we shall concentrate on techniques 
for showing problems to be “undecidable,” i.e., to have no algorithm, regardless 
of whether or not they are accepted by a Turing machine that fails to halt on 
some inputs. 

We prove undecidable the following problem: 
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e Does this Turing machine accept this input? 


Then, we exploit this undecidability result to exhibit a number of other un- 
decidable problems. For instance, we show that all nontrivial problems about 
the language accepted by a Turing machine are undecidable, as are a number 
of problems that have nothing at all to do with Turing machines, programs, or 
computers. 


9.1 A Language That Is Not Recursively 
Enumerable 


Recall that a language L is recursively enumerable (abbreviated RE) if L = 
L(M) for some TM M. Also, we shall in Section 9.2 introduce “recursive” 
or “decidable” languages that are not only recursively enumerable, but are 
accepted by a TM that always halts, regardless of whether or not it accepts. 

Our long-range goal is to prove undecidable the language consisting of pairs 
(M, w) such that: 


1. M is a Turing machine (suitably coded, in binary) with input alphabet 


{0,1}, 
2. w is a string of 0’s and 1’s, and 


3. M accepts input w. 


If this problem with inputs restricted to the binary alphabet is undecidable, 
then surely the more general problem, where TM’s may have any alphabet, is 
undecidable. 

Our first step is to set this question up as a true question about membership 
in a particular language. Thus, we must give a coding for Turing machines that 
uses only 0’s and 1’s, regardless of how many states the TM has. Once we have 
this coding, we can treat any binary string as if it were a Turing machine. If the 
string is not a well-formed representation of some TM, we may think of it as 
representing a TM with no moves. Thus, we may think of every binary string 
as some TM. 

An intermediate goal, and the subject of this section, involves the language 
La, the “diagonalization language,” which consists of all those strings w such 
that the TM represented by w does not accept the input w. We shall show that 
Lq has no Turing machine at all that accepts it. Remember that showing there 
is no Turing machine at all for a language is showing something stronger than 
that the language is undecidable (i.e., that it has no algorithm, or TM that 
always halts). 

The language La plays a role analogous to the hypothetical program Hə 
of Section 8.1.2, which prints hello, world whenever its input does not print 
hello, world when given itself as input. More precisely, just as Hə cannot 
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exist because its response when given itself as input is paradoxical, Dg cannot 
be accepted by a Turing machine, because if it were, then that Turing machine 
would have to disagree with itself when given a code for itself as input. 


9.1.1 Enumerating the Binary Strings 


In what follows, we shall need to assign integers to all the binary strings so 
that each string corresponds to one integer, and each integer corresponds to 
one string. If w is a binary string, treat lw as a binary integer i. Then we 
shall call w the ith string. That is, € is the first string, 0 is the second, 1 the 
third, 00 the fourth, 01 the fifth, and so on. Equivalently, strings are ordered 
by length, and strings of equal length are ordered lexicographically. Hereafter, 
we shall refer to the ith string as wi. 


9.1.2 Codes for Turing Machines 


Our next goal is to devise a binary code for Turing machines so that each TM 
with input alphabet {0,1} may be thought of as a binary string. Since we just 
saw how to enumerate the binary strings, we shall then have an identification of 
the Turing machines with the integers, and we can talk about “the ith Turing 
machine, M;.” To represent a TM M = (Q,{0,1},T,6,q,B,F) as a binary 
string, we must first assign integers to the states, tape symbols, and directions 
Land R. 


e We shall assume the states are q1,q2,...-,qr for some r. The start state 
will always be qi, and q2 will be the only accepting state. Note that, since 
we may assume the TM halts whenever it enters an accepting state, there 
is never any need for more than one accepting state. 


e We shall assume the tape symbols are X1, X2,..., Xs for some s. Xı 
always will be the symbol 0, Xə will be 1, and X3 will be B, the blank. 
However, other tape symbols can be assigned to the remaining integers 
arbitrarily. 


e We shall refer to direction L as Dı and direction R as Dog. 


Since each TM M can have integers assigned to its states and tape symbols in 
many different orders, there will be more than one encoding of the typical TM. 
However, that fact is unimportant in what follows, since we shall show that no 
encoding can represent a TM M such that L(M) = La. 

Once we have established an integer to represent each state, symbol, and 
direction, we can encode the transition function 6. Suppose one transition rule 
is ô(qi, Xj) = (qk, Xı, Dm), for some integers i, j, k, l, and m. We shall code 
this rule by the string 0°10710"10'10™. Notice that, since all of i, j, k, l, and m 
are at least one, there are no occurrences of two or more consecutive 1’s within 
the code for a single transition. 
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A code for the entire TM M consists of all the codes for the transitions, in 
some order, separated by pairs of 1’s: 


Cy11C211---Cy_111Cp, 
where each of the C’s is the code for one transition of M. 
Example 9.1: Let the TM in question be 
M = ({q1, 42593}, {0, 1}, {0, 1, B}, ô, q1, B, {a2}) 


where ô consists of the rules: 


ôlqı, 1) Ea (q3, 0, R) 
6(q3,0) = (q1, 1, R) 
ô(q3, 1) = (q2, 0, R) 
ô(q3, B) = (%3,1, L) 


The codes for each of these rules, respectively, are: 


0100100010100 
0001010100100 
00010010010100 
0001000100010010 


For example, the first rule can be written as (qı, X2) = (q3, X1, D2), since 
1 = X, 0 = Xı, and R = Də. Thus, its code is 0'1071010!107, as was 
indicated above. A code for M is: 


01001000101001100010101001001100010010010100110001000100010010 


Note that there are many other possible codes for M. In particular, the codes 
for the four transitions may be listed in any of 4! orders, giving us 24 codes for 
M. 


In Section 9.2.3, we shall have need to code pairs consisting of a TM and a 
string, (M,w). For this pair we use the code for M followed by 111, followed 
by w. Note that, since no valid code for a TM contains three 1’s in a row, we 
can be sure that the first occurrence of 111 separates the code for M from w. 
For instance, if M were the TM of Example 9.1, and w were 1011, then the 
code for (M, w) would be the string shown at the end of Example 9.1 followed 
by 1111011. 


9.1.3 The Diagonalization Language 


In Section 9.1.2 we coded Turing machines so there is now a concrete notion of 
M;i, the “ith Turing machine”: that TM M whose code is w;, the ith binary 
string. Many integers do not correspond to any TM at all. For instance, 11001 
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does not begin with 0, and 0010111010010100 is not valid because it has three 
consecutive 1’s. If w; is not a valid TM code, we shall take M; to be the TM 
with one state and no transitions. That is, for these values of i, M; is a Turing 
machine that immediately halts on any input. Thus, L(M;) is Ø if w; fails to 
be a valid TM code. 

Now, we can make a vital definition. 


e The language La, the diagonalization language, is the set of strings wi 
such that w; is not in L(M;). 


That is, Lg consists of all strings w such that the TM M whose code is w does 
not accept when given w as input. 

The reason Lg is called a “diagonalization” language can be seen if we 
consider Fig. 9.1. This table tells for all ¿ and j, whether the TM M; accepts 
input string wj; 1 means “yes it does” and 0 means “no it doesn’t.”' We may 
think of the ith row as the characteristic vector for the language L(M;); that 
is, the 1’s in this row indicate the strings that are members of this language. 


BR LU N =e 

On e ZO) ie 
ee ff e 
Coffe fo meju 
pp Oo C;Ff 


Diagonal 
Figure 9.1: The table that represents acceptance of strings by Turing machines 


The diagonal values tell whether M; accepts w;. To construct Lg, we com- 
plement the diagonal. For instance, if Fig. 9.1 were the correct table, then 
the complemented diagonal would begin 1,0,0,0,.... Thus, La would contain 
wı = €, not contain wz through w4, which are 0, 1, and 00, and so on. 

The trick of complementing the diagonal to construct the characteristic 
vector of a language that cannot be the language that appears in any row, 
is called diagonalization. It works because the complement of the diagonal is 
itself a characteristic vector describing membership in some language, namely 
La. This characteristic vector disagrees in some column with every row of the 
table suggested by Fig. 9.1. Thus, the complement of the diagonal cannot be 
the characteristic vector of any Turing machine. 


1You should note that the actual table does not look anything like the one suggested by 
the figure. Since all low integers fail to represent a valid TM code, and thus represent the 
trivial TM that makes no moves, the top rows of the table are in fact solid 0’s. 
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9.1.4 Proof That L4 Is Not Recursively Enumerable 


Following the above intuition about characteristic vectors and the diagonal, we 
shall now prove formally a fundamental result about Turing machines: there is 
no Turing machine that accepts the language La. 


Theorem 9.2: L4 is not a recursively enumerable language. That is, there is 
no Turing machine that accepts La. 


PROOF: Suppose Lg were L(M) for some TM M. Since Lg is a language over 
alphabet {0,1}, would be in the list of Turing machines we have constructed, 
since it includes all TM’s with input alphabet {0,1}. Thus, there is at least 
one code for M, say i; that is, M = M;i. 

Now, ask if w; is in Lg. 


e If w; isin Lg, then M; accepts wi. But then, by definition of La, wi is not 
in Lg, because Lg contains only those w; such that M; does not accept 
Wj. 


e Similarly, if w; is not in Dg, then M; does not accept w;, Thus, by defini- 
tion of Lg, w; is in La. 


Since w; can neither be in Lg nor fail to be in Lg, we conclude that there is a 
contradiction of our assumption that M exists. That is, Dg is not a recursively 
enumerable language. 


9.1.5 Exercises for Section 9.1 
Exercise 9.1.1: What strings are: 
x a) w37? 


b) w100? 


Exercise 9.1.2: Write one of the possible codes for the Turing machine of 
Fig. 8.9. 


Exercise 9.1.3: Here are two definitions of languages that are similar to the 
definition of La, yet different from that language. For each, show that the 
language is not accepted by a Turing machine, using a diagonalization-type 
argument. Note that you cannot develop an argument based on the diagonal 
itself, but must find another infinite sequence of points in the matrix suggested 
by Fig. 9.1. 


* a) The set of all w; such that w; is not accepted by Moi. 


b) The set of all w; such that wə; is not accepted by Mj. 


9.2. AN UNDECIDABLE PROBLEM THAT IS RE 383 


Exercise 9.1.4: We have considered only Turing machines that have input 
alphabet {0,1}. Suppose that we wanted to assign an integer to all Turing ma- 
chines, regardless of their input alphabet. That is not quite possible because, 
while the names of the states or noninput tape symbols are arbitrary, the par- 
ticular input symbols matter. For instance, the languages {0"1” | n > 1} and 
{a"b" | n > 1}, while similar in some sense, are not the same language, and they 
are accepted by different TM’s. However, suppose that we have an infinite set 
of symbols, {a1,a2,...} from which all TM input alphabets are chosen. Show 
how we could assign an integer to all TM’s that had a finite subset of these 
symbols as its input alphabet. 


9.2 An Undecidable Problem That Is RE 


Now, we have seen a problem — the diagonalization language Dg — that has 
no Turing machine to accept it. Our next goal is to refine the structure of the 
recursively enumerable (RE) languages (those that are accepted by TM’s) into 
two classes. One class, which corresponds to what we commonly think of as an 
algorithm, has a TM that not only recognizes the language, but it tells us when 
it has decided the input string is not in the language. Such a Turing machine 
always halts eventually, regardless of whether or not it reaches an accepting 
state. 

The second class of languages consists of those RE languages that are not 
accepted by any Turing machine with the guarantee of halting. These languages 
are accepted in an inconvenient way: if the input is in the language, we’ll 
eventually know that, but if the input is not in the language, then the Turing 
machine may run forever, and we shall never be sure the input won’t be accepted 
eventually. An example of this type of language, as we shall see, is the set of 
coded pairs (M,w) such that TM M accepts input w. 


9.2.1 Recursive Languages 


We call a language L recursive if L = L(M) for some Turing machine M such 
that: 


1. If w isin L, then M accepts (and therefore halts). 


2. If w is not in L, then M eventually halts, although it never enters an 
accepting state. 


A TM of this type corresponds to our informal notion of an “algorithm,” a 
well-defined sequence of steps that always finishes and produces an answer. 
If we think of the language L as a “problem,” as will be the case frequently, 
then problem L is called decidable if it is a recursive language, and it is called 
undecidable if it is not a recursive language. 

The existence or nonexistence of an algorithm to solve a problem is often 
of more importance than the existence of some TM to solve the problem. As 
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Not RE 


Figure 9.2: Relationship between the recursive, RE, and non-RE languages 


mentioned above, the Turing machines that are not guaranteed to halt may not 
give us enough information ever to conclude that a string is not in the language, 
so there is a sense in which they have not “solved the problem.” Thus, dividing 
problems or languages between the decidable — those that are solved by an 
algorithm — and those that are undecidable is often more important than the 
division between the recursively enumerable languages (those that have TM’s of 
some sort) and the non-recursively-enumerable languages (which have no TM 
at all). Figure 9.2 suggests the relationship among three classes of languages: 


1. The recursive languages. 
2. The languages that are recursively enumerable but not recursive. 


3. The non-recursively-enumerable (non-RE) languages. 


We have positioned the non-RE language Lg properly, and we also show the 
language Lu, or “universal language,” that we shall prove shortly not to be 
recursive, although it is RE. 


9.2.2 Complements of Recursive and RE languages 


A powerful tool in proving languages to belong in the second ring of Fig. 9.2 (i.e., 
to be RE, but not recursive) is consideration of the complement of the language. 
We shall show that the recursive languages are closed under complementation. 
Thus, if a language L is RE, but L, the complement of L, is not RE, then we 
know L cannot be recursive. For if L were recursive, then L would also be 
recursive and thus surely RE. We now prove this important closure property of 
the recursive languages. 


Theorem 9.3: If L is a recursive language, so is L. 
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Why “Recursive”? 


Programmers today are familiar with recursive functions. Yet these recur- 
sive functions don’t seem to have anything to do with Turing machines 
that always halt. Worse, the opposite — nonrecursive or undecidable — 
refers to languages that cannot be recognized by any algorithm, yet we 
are accustomed to thinking of “nonrecursive” as referring to computations 
that are so simple there is no need for recursive function calls. 

The term “recursive,” as a synonym for “decidable,” goes back to 
Mathematics as it existed prior to computers. Then, formalisms for com- 
putation based on recursion (but not iteration or loops) were commonly 
used as a notion of computation. These notations, which we shall not 
cover here, had some of the flavor of computation in functional program- 
ming languages such as LISP or ML. In that sense, to say a problem was 
“recursive” had the positive sense of “it is sufficiently simple that I can 
write a recursive function to solve it, and the function always finishes.” 
That is exactly the meaning carried by the term today, in connection with 
Turing machines. 

The term “recursively enumerable” harks back to the same family of 
concepts. A function could list all the members of a language, in some 
order; that is, it could “enumerate” them. The languages that can have 
their members listed in some order are the same as the languages that are 
accepted by some TM, although that TM might run forever on inputs that 
it does not accept. 


PROOF: Let L = L(M) for some TM M that always halts. We construct a TM 
M such that L = L(M) by the construction suggested in Fig. 9.3. That is, M 
behaves just like M. However, M is modified as follows to create M: 


1. The accepting states of M are made nonaccepting states of M with no 
transitions; i.e., in these states M will halt without accepting. 


2. M has a new accepting state r; there are no transitions from r. 


3. For each combination of a nonaccepting state of M and a tape symbol of 
M such that M has no transition (i.e., M halts without accepting), add 
a transition to the accepting state r. 


Since M is guaranteed to halt, we know that M is also guaranteed to halt. 
Moreover, M accepts exactly those strings that M does not accept. Thus M 
accepts L. 


There is another important fact about complements of languages that fur- 
ther restricts where in the diagram of Fig. 9.2 a language and its complement 
can fall. We state this restriction in the next theorem. 
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m Reject Reject 


Figure 9.3: Construction of a TM accepting the complement of a recursive 
language 


Theorem 9.4: If both a language L and its complement are RE, then L is 
recursive. Note that then by Theorem 9.3, L is recursive as well. 


PROOF: The proof is suggested by Fig. 9.4. Let L = L(M1) and Z = L(M3). 
Both Mı and Mə are simulated in parallel by a TM M. We can make M a 
two-tape TM, and then convert it to a one-tape TM, to make the simulation 
easy and obvious. One tape of M simulates the tape of Mı, while the other tape 
of M simulates the tape of Mə. The states of Mı and Mə are each components 
of the state of M. 


m Accept mœ Accept 


m Accept mœ Reject 


Figure 9.4: Simulation of two TM’s accepting a language and its complement 


If input w to M isin L, then Mı will eventually accept. If so, M accepts 
and halts. If w is not in L, then it is in L, so Mọ will eventually accept. When 
Mə accepts, M halts without accepting. Thus, on all inputs, M halts, and 
L(M) is exactly L. Since M always halts, and L(M) = L, we conclude that L 
is recursive. 


We may summarize Theorems 9.3 and 9.4 as follows. Of the nine possible 
ways to place a language L and its complement L in the diagram of Fig. 9.2, 
only the following four are possible: 


1. Both L and Z are recursive; i.e., both are in the inner ring. 
2. Neither L nor L is RE; i.e., both are in the outer ring. 


3. Lis RE but not recursive, and L is not RE; i.e., one is in the middle ring 
and the other is in the outer ring. 
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4. Lis RE but not recursive, and L is not RE; i.e., the same as (3), but with 
L and L swapped. 


In proof of the above, Theorem 9.3 eliminates the possibility that one language 
(L or L) is recursive and the other is in either of the other two classes. Theo- 
rem 9.4 eliminates the possibility that both are RE but not recursive. 


Example 9.5: As an example, consider the language Lg, which we know is 
not RE. Thus, Lg could not be recursive. It is, however, possible that Ly could 
be either non-RE or RE-but-not-recursive. It is in fact the latter. 

Lz is the set of strings w; such that M; accepts w;. This language is similar 
to the universal language Lu consisting of all pairs (M, w) such that M accepts 
w, which we shall show in Section 9.2.3 is RE. The same argument can be used 
to show La is RE. 


9.2.3 The Universal Language 


We already discussed informally in Section 8.6.2 how a Turing machine could be 
used to simulate a computer that had been loaded with an arbitrary program. 
That is to say, a single TM can be used as a “stored program computer,” 
taking its program as well as its data from one or more tapes on which input is 
placed. In this section, we shall repeat the idea with the additional formality 
that comes with talking about the Turing machine as our representation of a 
stored program. 

We define Lu, the universal language, to be the set of binary strings that 
encode, in the notation of Section 9.1.2, a pair (M,w), where M is a TM with 
the binary input alphabet, and w is a string in (0+1)*, such that w is in L(M). 
That is, Dy, is the set of strings representing a TM and an input accepted by 
that TM. We shall show that there is a TM U, often called the universal Turing 
machine, such that Lu = L(U). Since the input to U is a binary string, U is 
in fact some Mj in the list of binary-input Turing machines we developed in 
Section 9.1.2. 

It is easiest to describe U as a multitape Turing machine, in the spirit of 
Fig. 8.22. In the case of U, the transitions of M are stored initially on the first 
tape, along with the string w. A second tape will be used to hold the simulated 
tape of M, using the same format as for the code of M. That is, tape symbol 
X; of M will be represented by 0’, and tape symbols will be separated by single 
1’s. The third tape of U holds the state of M, with state q; represented by i 
0’s. A sketch of U is in Fig. 9.5. 

The operation of U can be summarized as follows: 


1. Examine the input to make sure that the code for M is a legitimate code 
for some TM. If not, U halts without accepting. Since invalid codes are 
assumed to represent the TM with no moves, and such a TM accepts no 
inputs, this action is correct. 
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Figure 9.5: Organization of a universal Turing machine 


A More Efficient Universal TM 


An efficient simulation of M by U, one that would not require us to shift 
symbols on the tape, would have U first determine the number of tape 
symbols M used. If there are between 247! +1 and 2% symbols, U could 


use a k-bit binary code to represent the different tape symbols uniquely. 
Tape cells of M could be simulated by k of U’s tape cells. To make things 
even easier, the given transitions of M could be rewritten by U to use 
the fixed-length binary code instead of the variable-length unary code we 
introduced. 


2. Initialize the second tape to contain the input w, in its encoded form. 
That is, for each 0 of w, place 10 on the second tape, and for each 1 of 
w, place 100 there. Note that the blanks on the simulated tape of M, 
which are represented by 1000, will not actually appear on that tape; all 
cells beyond those used for w will hold the blank of U. However, U knows 
that, should it look for a simulated symbol of M and find its own blank, 
it must replace that blank by the sequence 1000 to simulate the blank of 
M. 


3. Place 0, the start state of M, on the third tape, and move the head of 
U’s second tape to the first simulated cell. 


4. To simulate a move of M, U searches on its first tape for a transition 
0°10/10*10'10™, such that 0° is the state on tape 3, and 0f is the tape 
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symbol of M that begins at the position on tape 2 scanned by U. This 
transition is the one M would next make. U should: 


(a) Change the contents of tape 3 to 0*; that is, simulate the state change 
of M. To do so, U first changes all the 0’s on tape 3 to blanks, and 
then copies 0" from tape 1 to tape 3. 


(b) Replace 0’ on tape 2 by 0!; that is, change the tape symbol of M. 
If more or less space is needed (i.e., i 4 l), use the scratch tape and 
the shifting-over technique of Section 8.6.2 to manage the spacing. 


(c) Move the head on tape 2 to the position of the next 1 to the left 
or right, respectively, depending on whether m = 1 (move left) or 
m = 2 (move right). Thus, U simulates the move of M to the left or 
to the right. 


5. If M has no transition that matches the simulated state and tape symbol, 
then in (4), no transition will be found. Thus, M halts in the simulated 
configuration, and U must do likewise. 


6. If M enters its accepting state, then U accepts. 


In this manner, U simulates M on w. U accepts the coded pair (M,w) if and 
only if M accepts w. 


9.2.4 Undecidability of the Universal Language 


We can now exhibit a problem that is RE but not recursive; it is the language 
Lu. Knowing that Lu is undecidable (i.e., not a recursive language) is in many 
ways more valuable than our previous discovery that La is not RE. The reason 
is that the reduction of L,, to another problem P can be used to show there 
is no algorithm to solve P, regardless of whether or not P is RE. However, 
reduction of Lg to P is only possible if P is not RE, so Lg cannot be used to 
show undecidability for those problems that are RE but not recursive. On the 
other hand, if we want to show a problem not to be RE, then only La can be 
used; L,, is useless since it is RE. 


Theorem 9.6: L, is RE but not recursive. 


PROOF: We just proved in Section 9.2.3 that Dy, is RE. Suppose La were 
recursive. Then by Theorem 9.3, Lu, the complement of Lu, would also be 
recursive. However, if we have a TM M to accept Lu, then we can construct a 
TM to accept La (by a method explained below). Since we already know that 
Lais not RE, we have a contradiction of our assumption that La is recursive. 

Suppose L(M) = L,. As suggested by Fig. 9.6, we can modify TM M into 
a TM M' that accepts Lg as follows. 


1. Given string w on its input, M’ changes the input to wlllw. You may, 
as an exercise, write a TM program to do this step on a single tape. 
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The Halting Problem 


One often hears of the halting problem for Turing machines as a problem 
similar to Luy — one that is RE but not recursive. In fact, the original 
Turing machine of A. M. Turing accepted by halting, not by final state. 
We could define H(M) for TM M to be the set of inputs w such that M 
halts given input w, regardless of whether or not M accepts w. Then, the 
halting problem is the set of pairs (M,w) such that w is in H(M). This 
problem /language is another example of one that is RE but not recursive. 


Hypothetical > Accept —7—> Accept 
K >| Copy œ w111w >| algorithm 
M fo L Reject —— Reject 
M’ for L, 


Figure 9.6: Reduction of La to Lu 


However, an easy argument that it can be done is to use a second tape to 
copy w, and then convert the two-tape TM to a one-tape TM. 


2. M' simulates M on the new input. If w is w; in our enumeration, then 


M' determines whether M; accepts w;. Since M accepts La, it will accept 
if and only if M; does not accept wi; i.e., w; is in Lg. 


Thus, M’ accepts w if and only if w is in Lg. Since we know M' cannot exist 
by Theorem 9.2, we conclude that Lu is not recursive. 


9.2.5 Exercises for Section 9.2 


Exercise 9.2.1: Show that the halting problem, the set of (M, w) pairs such 
that M halts (with or without accepting) when given input w is RE but not 
recursive. (See the box on “The Halting Problem” in Section 9.2.4.) 


Exercise 9.2.2: In the box “Why ‘Recursive’ ?” in Section 9.2.1 we suggested 
that there was a notion of “recursive function” that competed with the Turing 
machine as a model for what can be computed. In this exercise, we shall 
explore an example of the recursive-function notation. A recursive function 
is a function F defined by a finite set of rules. Each rule specifies the value 
of the function F for certain arguments; the specification can use variables, 
nonnegative-integer constants, the successor (add one) function, the function 
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F itself, and expressions built from these by composition of functions. For 
example, Ackermann’s function is defined by the rules: 


1. A(O,y) = 1 for any y > 0. 
2. A(1,0) = 2. 
3. A(x,0) =x +2 for x > 2. 
4. A(x +1,y +1) = A(A(z,y + 1),y) for any z > 0 and y > 0. 
Answer the following: 
* a) Evaluate A(2, 1). 
! b) What function of x is A(a, 2)? 
! c) Evaluate A(4,3). 
Exercise 9.2.3: Informally describe multitape Turing machines that enumer- 


ate the following sets of integers, in the sense that started with blank tapes, it 
prints on one of its tapes 10%110¢21-.- to represent the set {i1,i2,...}. 


* a) The set of all perfect squares {1,4,9,...}. 
b) The set of all primes {2,3,5,7,11,...}. 


1! c) The set of all į such that M; accepts w;. Hint: It is not possible to generate 
all these z’s in numerical order. The reason is that this language, which 
is Lg, is RE but not recursive. In fact, a definition of the RE-but-not- 
recursive languages is that they can be enumerated, but not in numerical 
order. The “trick” to enumerating them at all is that we have to simulate 
all M;’s on w;, but we cannot allow any M; to run forever, since it would 
preclude trying any other Mj for j # i as soon as we encountered some 
M; that does not halt on w;. Thus, we need to operate in rounds, where 
in the kth round we try only a limited set of M;’s, and we do so for only 
a limited number of steps. Thus, each round can be completed in finite 
time. As long as for each TM M; and for each number of steps s there is 
some round such that M; will be simulated for at least s steps, then we 
shall eventually discover each M; that accepts w; and enumerate i. 


* Exercise 9.2.4: Let L1, Lə2,..., Lp be a collection of languages over alphabet 
X such that: 


1. For all i 4 j, Li N L; = 9; i.e., no string is in two of the languages. 
2. Lı U La U ++- U Ly = %*; i.e., every string is in one of the languages. 
3. Each of the languages L;, for i = 1,2,...,k is recursively enumerable. 


Prove that each of the languages is therefore recursive. 


*Y 
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! Exercise 9.2.5: Let L be recursively enumerable and let L be non-RE. Con- 


sider the language 
L' = {0w | w is in L} U {1w | w is not in L} 


Can you say for certain whether L’ or its complement are recursive, RE, or 
non-RE? Justify your answer. 


Exercise 9.2.6: We have not discussed closure properties of the recursive 
languages or the RE languages, other than our discussion of complementation 
in Section 9.2.2. Tell whether the recursive languages and/or the RE languages 
are closed under the following operations. You may give informal, but clear, 
constructions to show closure. 


Union. 

Intersection. 
Concatenation. 
Kleene closure (star). 
Homomorphism. 


Inverse homomorphism. 


9.3 Undecidable Problems About Turing 
Machines 


We shall now use the languages L, and Lg, whose status regarding decidability 
and recursive enumerability we know, to exhibit other undecidable or non-RE 
languages. The reduction technique will be exploited in each of these proofs. 
Our first undecidable problems are all about Turing machines. In fact, our 
discussion in this section culminates with the proof of “Rice’s theorem,” which 
says that any nontrivial property of Turing machines that depends only on 
the language the TM accepts must be undecidable. Section 9.4 will let us 
investigate some undecidable problems that do not involve Turing machines or 
their languages. 


9.3.1 Reductions 


We introduced the notion of a reduction in Section 8.1.3. In general, if we have 
an algorithm to convert instances of a problem P, to instances of a problem 
P> that have the same answer, then we say that P, reduces to P). We can 
use this proof to show that P> is at least as hard as Pı. Thus, if Pı is not 
recursive, then P> cannot be recursive. If P) is non-RE, then P> cannot be RE. 
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yes 
yes 
no 
no 
P P, 


Figure 9.7: Reductions turn positive instances into positive, and negative to 
negative 


As we mentioned in Section 8.1.3, you must be careful to reduce a known hard 
problem to one you wish to prove to be at least as hard, never the opposite. 

As suggested in Fig. 9.7, a reduction must turn any instance of P, that has 
a “yes” answer into an instance of P2 with a “yes” answer, and every instance 
of P, with a “no” answer must be turned into an instance of P) with a “no” 
answer. Note that it is not essential that every instance of P> be the target of 
one or more instances of P, and in fact it is quite common that only a small 
fraction of Pə is a target of the reduction. 

Formally, a reduction from P, to P> is a Turing machine that takes an in- 
stance of P, written on its tape and halts with an instance of Pə on its tape. 
In practice, we shall generally describe reductions as if they were computer 
programs that take an instance of P) as input and produce an instance of P, 
as output. The equivalence of Turing machines and computer programs allows 
us to describe the reduction by either means. The importance of reductions is 
emphasized by the following theorem, of which we shall see numerous applica- 
tions. 


Theorem 9.7: If there is a reduction from Pı to P2, then: 
a) If P, is undecidable then so is P2. 
b) If P; is non-RE, then so is Py. 


PROOF: First suppose Pı is undecidable. If it is possible to decide P>, then we 
can combine the reduction from P, to Pə with the algorithm that decides Pə 
to construct an algorithm that decides Pı. The idea was suggested in Fig. 8.7. 
In more detail, suppose we are given an instance w of Pı. Apply to w the 
algorithm that converts w into an instance x of Pə. Then use the algorithm 
that applies Pə to x. If that algorithm says “yes,” then x is in P2. Because we 
reduced P, to P>, we know the answer to w for Pı is “yes”; i.e., w is in Pi. 
Likewise, if x is not in Py then w is not in P,, and whatever answer we give to 
the question “is x in P>?” is also the correct answer to “is w in P,?” 
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We have thus contradicted the assumption that P, is undecidable. Our 
conclusion is that if P, is undecidable, then Pə is also undecidable. 

Now, consider part (b). Assume that P, is non-RE, but P> is RE. Now, 
we have an algorithm to reduce P, to P2, but we have only a procedure to 
recognize P>; that is, there is a TM that says “yes” if its input is in P> but may 
not halt if its input is not in P>. As for part (a), starting with an instance w of 
P,, convert it by the reduction algorithm to an instance x of Pə. Then apply 
the TM for P» to x. If x is accepted, then accept w. 

This procedure describes a TM (which may not halt) whose language is P,. 
If w isin Pj, then x isin P>, so this TM will accept w. If w is not in Pı, then x 
is not in Py. Then, the TM may or may not halt, but will surely not accept w. 
Since we assumed no TM for P, exists, we have shown by contradiction that 
no TM for P» exists either; i.e., if P, is non-RE, then Ps is non-RE. 


9.3.2 Turing Machines That Accept the Empty Language 


As an example of reductions involving Turing machines, let us investigate two 
languages called Le and Lne. Each consists of binary strings. If w is a binary 
string, then it represents some TM, M;, in the enumeration of Section 9.1.2. 

If L(M;) = 9, that is, M; does not accept any input, then w is in Le. 
Thus, Le is the language consisting of all those encoded TM’s whose language 
is empty. On the other hand, if L(M;) is not the empty language, then w is in 
Ine. Thus, Ine is the language of all codes for Turing machines that accept at 
least one input string. 

In what follows, it is convenient to regard strings as the Turing machines 
they represent. Thus, we may define the two languages just mentioned as: 


e Le = {M | L(M) = 9} 

e Ine ={M | L(M) #0} 
Notice that Le and Lne are both languages over the binary alphabet {0,1}, 
and that they are complements of one another. We shall see that Lne is the 
“easier” of the two languages; it is RE but not recursive. On the other hand, 
Le is non-RE. 
Theorem 9.8: Ly. is recursively enumerable. 


PROOF: We have only to exhibit a TM that accepts Lne. It is easiest to describe 
a nondeterministic TM M, whose plan is shown in Fig. 9.8. By Theorem 8.11, 
M can be converted to a deterministic TM. 

The operation of M is as follows. 


1. M takes as input a TM code M;. 


2. Using its nondeterministic capability, M guesses an input w that M; might 
accept. 


9.3. UNDECIDABLE PROBLEMS ABOUT TURING MACHINES 395 


Guessed 
vw —_— mœ Accept -M Accept 


Meas * = a 


M for L 
ne 


Figure 9.8: Construction of a NTM to accept Lne 


3. M tests whether M; accepts w. For this part, M can simulate the uni- 
versal TM U that accepts Ly. 


4. If M; accepts w, then M accepts its own input, which is M;. 


In this manner, if M; accepts even one string, M will guess that string (among 
all others, of course), and accept M;. However, if L(M;) = 9, then no guess w 
leads to acceptance by M;, so M does not accept M;. Thus, L(M) = Lre. 


Our next step is to prove that Lne is not recursive. To do so, we reduce 
Ly to Lne. That is, we shall describe an algorithm that transforms an input 
(M,w) into an output M’, the code for another Turing machine, such that w 
is in L(M) if and only if L(M') is not empty. That is, M accepts w if and 
only if M’ accepts at least one string. The trick is to have M’ ignore its input, 
and instead simulate M on input w. If M accepts, then M’ accepts its own 
input; thus acceptance of w by M is tantamount to L(M') being nonempty. If 
Dine were recursive, then we would have an algorithm to tell whether or not M 
accepts w: construct M’ and see whether L(M') = @. 


Theorem 9.9: LIne is not recursive. 


PROOF: We shall follow the outline of the proof given above. We must design 
an algorithm that converts an input that is a binary-coded pair (M,w) into a 
TM M' such that L(M') # @if and only if M accepts input w. The construction 
of M' is sketched in Fig. 9.9. As we shall see, if M does not accept w, then M' 
accepts none of its inputs; i.e., L(M') = 9. However, if M accepts w, then M’ 
accepts every input, and thus L(M') surely is not 0. 


wo mœ Accept -M Accept 


M’ 


Figure 9.9: Plan of the TM M' constructed from (M,w) in Theorem 9.9; M’ 
accepts arbitrary input if and only if M accepts w 


M' is designed to do the following: 
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1. M' ignores its own input x. Rather, it replaces its input by the string 
that represents TM M and input string w. Since M' is designed for a 
specific pair (M,w), which has some length n, we may construct M’ to 
have a sequence of states go, q1,- --,qn, where qo is the start state. 


(a) In state qi, fori = 0,1,...,n — 1, M’ writes the (i + 1)st bit of the 
code for (M,w), goes to state qi+1, and moves right. 

(b) In state gn, M’ moves right, if necessary, replacing any nonblanks 
(which would be the tail of x, if that input to M’ is longer than n) 
by blanks. 


2. When M’ reaches a blank in state qn, it uses a similar collection of states 
to reposition its head at the left end of the tape. 


3. Now, using additional states, M’ simulates a universal TM U on its 
present tape. 


4. If U accepts, then M' accepts. If U never accepts, then M’ never accepts 
either. 


The description of M’ above should be sufficient to convince you that you could 
design a Turing machine that would transform the code for M and the string 
w into the code for M’. That is, there is an algorithm to perform the reduction 
of Lu to Ine. We also see that if M accepts w, then M' accepts whatever 
input x was originally on its tape. The fact that x was ignored is irrelevant; the 
definition of acceptance by a TM says that whatever was placed on the tape, 
before commencing operation, is what the TM accepts. Thus, if M accepts w, 
then the code for M’ is in Lye. 

Conversely, if M does not accept w, then M’ never accepts, no matter 
what its input is. Hence, in this case the code for M' is not in Ine. We have 
successfully reduced L,, to Ene by the algorithm that constructs M’ from M and 
w; we may conclude that, since Ly is not recursive, neither is Lne. The existence 
of this reduction is sufficient to complete the proof. However, to illustrate the 
impact of the reduction, we shall take this argument one step further. If Lne 
were recursive, then we could develop an algorithm for La as follows: 


1. Convert (M,w) to the TM M' as above. 
2. Use the hypothetical algorithm for Lre to tell whether or not L(M') = @. 
If so, say M does not accept w; if L(M') 4 0, say M does accept w. 


Since we know by Theorem 9.6 that no such algorithm for Lu exists, we have 
contradicted the assumption that Lne is recursive, and conclude that Lne is not 
recursive. 


Now, we know the status of Le. If Le were RE, then by Theorem 9.4, both 
it and Ly. would be recursive. Since Lne is not recursive by Theorem 9.9, we 
conclude that: 


Theorem 9.10: L, is not RE. 
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Why Problems and Their Complements are Different 


Our intuition tells us that a problem and its complement are really the 
same problem. To solve one, we can use an algorithm for the other, and 
at the last step, complement the output: say “yes” instead of “no,” and 
vice-versa. That instinct is exactly right, as long as the problem and its 
complement are recursive. 

However, as we discussed in Section 9.2.2, there are two other possi- 
bilities. First, neither the problem nor its complement are even RE. Then, 
neither can be solved by any kind of TM at all, so in a sense the two are 
again similar. However, the interesting case, typified by Le and Le, is 
when one is RE and the other is non-RE. 

For the language that is RE, we can design a TM that takes an input 
w and searches for a reason why w is in the language. Thus, for Lne, 
given a TM M as input, we set our TM looking for strings that the TM 
M accepts, and as soon as we find one, we accept M. If M is a TM with 
an empty language, we never know for certain that M is not in Lne, but 
we never accept M, and that is the correct response by the TM. 

On the other hand, for the complement problem Le, which is not RE, 
there is no way ever to accept all its strings. Suppose we are given a string 
M that is a TM whose language is empty. We can test inputs to the TM 
M, and we may never find one that M accepts, yet we can never be sure 
that there isn’t some input we’ve not yet tested, that this TM accepts. 
Thus, M can never be accepted, even if it should be. 


9.3.3 Rice’s Theorem and Properties of the RE Languages 


The fact that languages like Le and Lne are undecidable is actually a special case 
of a far more general theorem: all nontrivial properties of the RE languages are 
undecidable, in the sense that it is impossible to recognize by a Turing machine 
those binary strings that are codes for a TM whose language has the property. 
An example of a property of the RE languages is “the language is context free.” 
It is undecidable whether a given TM accepts a context-free language, as a 
special case of the general principle that all nontrivial properties of the RE 
languages are undecidable. 

A property of the RE languages is simply a set of RE languages. Thus, the 
property of being context-free is formally the set of all CFL’s. The property of 
being empty is the set {Ø} consisting of only the empty language. 

A property is trivial if it is either empty (i.e., satisfied by no language at 
all), or is all RE languages. Otherwise, it is nontrivial. 


e Note that the empty property, Ø, is different from the property of being 
an empty language, {0}. 
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We cannot recognize a set of languages as the languages themselves. The 
reason is that the typical language, being infinite, cannot be written down as 
a finite-length string that could be input to a TM. Rather, we must recognize 
the Turing machines that accept those languages; the TM code itself is finite, 
even if the language it accepts is infinite. Thus, if P is a property of the RE 
languages, the language Lp is the set of codes for Turing machines M; such that 
L(M;) is a language in P. When we talk about the decidability of a property 
P, we mean the decidability of the language Lp. 


Theorem 9.11: (Rice’s Theorem) Every nontrivial property of the RE lan- 
guages is undecidable. 


PROOF: Let P be a nontrivial property of the RE languages. Assume to begin 
that Ø, the empty language, is not in P; we shall return later to the opposite 
case. Since P is nontrivial, there must be some nonempty language L that is 
in P. Let Mz be a TM accepting L. 

We shall reduce L, to Lp, thus proving that Lp is undecidable, since Lu 
is undecidable. The algorithm to perform the reduction takes as input a pair 
(M,w) and produces a TM M’. The design of M' is suggested by Fig. 9.10; 
L(M’') is Ø if M does not accept w, and L(M') = L if M accepts w. 


Accept 


w — M start Accept Accept 
z 


Figure 9.10: Construction of M’ for the proof of Rice’s Theorem 


M' is a two-tape TM. One tape is used to simulate M on w. Remember 
that the algorithm performing the reduction is given M and w as input, and 
can use this input in designing the transitions of M’. Thus, the simulation of 
M on w is “built into” M'; the latter TM does not have to read the transitions 
of M on a tape of its own. 

The other tape of M’ is used to simulate Mr, on the input x to M’, if 
necessary. Again, the transitions of Mgr are known to the reduction algorithm 
and may be “built into” the transitions of M'. The TM M’ is constructed to 
do the following: 


1. Simulate M on input w. Note that w is not the input to M'; rather, M' 
writes M and w onto one of its tapes and simulates the universal TM U 
on that pair, as in the proof of Theorem 9.8. 


2. If M does not accept w, then M’ does nothing else. M’ never accepts its 
own input, x, so L(M') = 9. Since we assume 9 is not in property P, that 
means the code for M’ is not in Lp. 
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3. If M accepts w, then M’ begins simulating My on its own input x. Thus, 
M' will accept exactly the language L. Since L is in P, the code for M' 
isin Lp. 


You should observe that constructing M’ from M and w can be carried out by 
an algorithm. Since this algorithm turns (M, w) into an M’ that is in Lp if and 
only if (M,w) is in Lu, this algorithm is a reduction of L, to Lp, and proves 
that the property P is undecidable. 

We are not quite done. We need to consider the case where É is in P. If 
so, consider the complement property P, the set of RE languages that do not 
have property P. By the foregoing, P is undecidable. However, since every TM 
accepts an RE language, Lp, the set of (codes for) Turing machines that do 
not accept a language in P is the same as Lz, the set of TM’s that accept a 
language in P. Suppose Lp were decidable. Then so would be Ls, because the 
complement of a recursive language is recursive (Theorem 9.3). 


9.3.4 Problems about Turing-Machine Specifications 


All problems about Turing machines that involve only the language that the 
TM accepts are undecidable, by Theorem 9.11. Some of these problems are 
interesting in their own right. For instance, the following are undecidable: 


1. Whether the language accepted by a TM is empty (which we knew from 
Theorems 9.9 and 9.3). 


2. Whether the language accepted by a TM is finite. 


3. Whether the language accepted by a TM is a regular language. 


4. Whether the language accepted by a TM is a context-free language. 


However, Rice’s Theorem does not imply that everything about a TM is 
undecidable. For instance, questions that ask about the states of the TM, 
rather than about the language it accepts, could be decidable. 


Example 9.12: It is decidable whether a TM has five states. The algorithm 
to decide this question simply looks at the code for the TM and counts the 
number of states that appear in any of its transitions. 

As another example, it is decidable whether there exists some input such 
that the TM makes at least five moves. The algorithm becomes obvious when 
we remember that if a TM makes five moves, then it does so looking only at 
the nine cells of its tape surrounding its initial head position. Thus, we may 
simulate the TM for five moves on any of the finite number of tapes consisting 
of five or fewer input symbols, preceded and followed by blanks. If any of these 
simulations fails to reach a halting situation, then we conclude that the TM 
makes at least five moves on some input. 
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9.3.5 Exercises for Section 9.3 


Exercise 9.3.1: Show that the set of Turing-machine codes for TM’s that 
accept all inputs that are palindromes (possibly along with some other inputs) 
is undecidable. 


Exercise 9.3.2: The Big Computer Corp. has decided to bolster its sagging 
market share by manufacturing a high-tech version of the Turing machine, called 
BWTM, that is equipped with bells and whistles. The BWTM is basically the 
same as your ordinary Turing machine, except that each state of the machine is 
labeled either a “bell-state” or a “whistle-state.” Whenever the BWTM enters 
a new state, it either rings the bell or blows the whistle, depending on which 
type of state it has just entered. Prove that it is undecidable whether a given 
BWTM M, on given input w, ever blows the whistle. 


Exercise 9.3.3: Show that the language of codes for TM’s M that, when 
started with blank tape, eventually write a 1 somewhere on the tape is unde- 
cidable. 


Exercise 9.3.4: We know by Rice’s theorem that none of the following prob- 
lems are decidable. However, are they recursively enumerable, or non-RE? 


a) Does L(M) contain at least two strings? 
b) Is L(M) infinite? 
c) Is L(M) a context-free language? 

* d) Is L(M) = (L(M))*? 


Exercise 9.3.5: Let L be the language consisting of pairs of TM codes plus 
an integer, (Mı, M2,k), such that L(M1ı) O L(Mə2) contains at least k strings. 
Show that L is RE, but not recursive. 


Exercise 9.3.6: Show that the following questions are decidable: 


* a) The set of codes for TM’s M such that, when started with blank tape 
will eventually write some nonblank symbol on its tape. Hint: If M has 
m states, consider the first m transitions that it makes. 


! b) The set of codes for TM’s that never make a move left on any input. 


! c) The set of pairs (M,w) such that TM M, started with input w, never 
scans any tape cell more than once. 


Exercise 9.3.7: Show that the following problems are not recursively enumer- 
able: 


* a) The set of pairs (M,w) such that TM M, started with input w, does not 
halt. 
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b) The set of pairs (Mı, M2) such that L(M1) N L(Mə) = 0. 


c) The set of triples (Mı, M2, M3) such that L(Mı) = L(M2)L(Ms3); i.e., 
the language of the first is the concatenation of the languages of the other 
two TM’s. 


Exercise 9.3.8: Tell whether each of the following are recursive, RE-but-not- 
recursive, or non-RE. 


The set of all TM codes for TM’s that halt on every input. 
The set of all TM codes for TM’s that halt on no input. 
The set of all TM codes for TM’s that halt on at least one input. 


The set of all TM codes for TM’s that fail to halt on at least one input. 


9.4  Post’s Correspondence Problem 


In this section, we begin reducing undecidable questions about Turing machines 
to undecidable questions about “real” things, that is, common matters that have 
nothing to do with the abstraction of the Turing machine. We begin with a 
problem called “Post’s Correspondence Problem” (PCP), which is still abstract, 
but it involves strings rather than Turing machines. Our goal is to prove this 
problem about strings to be undecidable, and then use its undecidability to 
prove other problems undecidable by reducing PCP to those. 

We shall prove PCP undecidable by reducing Lu to PCP. To facilitate the 
proof, we introduce a “modified” PCP, and reduce the modified problem to the 
original PCP. Then, we reduce L,, to the modified PCP. The chain of reductions 
is suggested by Fig. 9.11. Since the original Lu is known to be undecidable, we 
conclude that PCP is undecidable. 


Lı at MPCP ai PCP 


algorithm algorithm 


Figure 9.11: Reductions proving the undecidability of Post’s Correspondence 
Problem 


9.4.1 Definition of Post’s Correspondence Problem 


An instance of Post’s Correspondence Problem (PCP) consists of two lists of 
strings over some alphabet X; the two lists must be of equal length. We generally 
refer to the A and B lists, and write A = w1, we,...,w, and B= £1, %2,...,Xp, 
for some integer k. For each i, the pair (w;, xi) is said to be a corresponding 
pair. 
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We say this instance of PCP has a solution, if there is a sequence of one or 
more integers 71, 72,...,%4m that, when interpreted as indexes for strings in the 
A and B lists, yield the same string. That is, wi, Wis --+Wig = Liy Big Bim- 
We say the sequence 71,i2,...,%m is a solution to this instance of PCP, if so. 
The Post’s correspondence problem is: 


e Given an instance of PCP, tell whether this instance has a solution. 


List A | List B 


Figure 9.12: An instance of PCP 


Example 9.13: Let © = {0,1}, and let the A and B lists be as defined in 
Fig. 9.12. In this case, PCP has a solution. For instance, let m = 4, i, = 2, 
i2 = 1, i3 = 1, and i4 = 3; i.e., the solution is the list 2,1,1,3. We verify that 
this list is a solution by concatenating the corresponding strings in order for 
the two lists. That is, wow1w,w3 = %2%121,x23 = 101111110. Note this solution 
is not unique. For instance, 2,1,1,3,2,1,1,3 is another solution. 


Example 9.14: Here is an example where there is no solution. Again we let 
E£ = {0,1}, but now the instance is the two lists given in Fig. 9.13. 

Suppose that the PCP instance of Fig. 9.13 has a solution, say 71,72,..-,4m, 
for some m > 1. We claim i} = 1. For if i} = 2, then a string beginning 
with wə = 011 would have to equal a string that begins with xə = 11. But 
that equality is impossible, since the first symbols of these two strings are 0 
and 1, respectively. Similarly, it is not possible that i; = 3, since then a string 
beginning with w3 = 101 would have to equal a string beginning with z3 = 011. 

If 7; = 1, then the two corresponding strings from lists A and B would have 
to begin: 


A: 10--: 
B: 101.. 


Now, let us see what i2 could be. 
1. If i2 = 1, then we have a problem, since no string beginning with wyw, = 


1010 can match a string that begins with 2,2, = 101101; they must 
disagree at the fourth position. 
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PCP as a Language 


Since we are discussing the problem of deciding whether a given instance 
of PCP has a solution, we need to express this problem as a language. As 
PCP allows instances to have arbitrary alphabets, the language PCP is 
really a set of strings over some fixed alphabet, which codes instances of 
PCP, much as we coded Turing machines that have arbitrary sets of states 
and tape symbols, in Section 9.1.2. For example, if a PCP instance has 
an alphabet with up to 2* symbols, we can use distinct k-bit binary codes 
for each of the symbols. 

Since each PCP instance has a finite alphabet, we can find some k 
for each instance. We can then code all instances in a 3-symbol alphabet 
consisting of 0, 1, and a “comma” symbol to separate strings. We begin 
the code by writing k in binary, followed by a comma. Then follow each of 
the pairs of strings, with strings separated by commas and their symbols 
coded in a k-bit binary code. 


List A | List B 


Figure 9.13: Another PCP instance 


2. If i2 = 2, we again have a problem, because no string that begins with 
wiw = 10011 can match a string that begins with x7;72 = 10111; they 
must. differ at the third position. 


3. Only i2 = 3 is possible. 


If we choose 72 = 3, then the corresponding strings formed from list of integers 
115 13 are: 


A: 10101--- 
B: 101011... 


There is nothing about these strings that immediately suggests we cannot ex- 
tend list 1,3 to a solution. However, we can argue that it is not possible to do 
so. The reason is that we are in the same condition we were in after choosing 
i, = 1. The string from the B list is the same as the string from the A list 
except that in the B list there is an extra 1 at the end. Thus, we are forced 
to choose i3 = 3, i4 = 3, and so on, to avoid creating a mismatch. We can 
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Partial Solutions 


In Example 9.14 we used a technique for analyzing PCP instances that 
comes up frequently. We considered what the possible partial solu- 
tions were, that is, sequences of indexes 71,i2,...,7, such that one of 
Wi Win + Wi, and Xj, Ziz +++ £i, is a prefix of the other, although the two 
strings are not equal. Notice that if a sequence of integers is a solution, 
then every prefix of that sequence must be a partial solution. Thus, un- 
derstanding what the partial solutions are allows us to argue about what 
solutions there might be. 

Note, however, that because PCP is undecidable, there is no algorithm 
to compute all the partial solutions. There can be an infinite number of 
them, and worse, there is no upper bound on how different the lengths of 
the strings Wi Wis +++ w;, and Zi Zis +++; can be, even though the partial 
solution leads to a solution. 


never allow the A string to catch up to the B string, and thus can never reach 
a solution. 


9.4.2 The “Modified” PCP 


It is easier to reduce L,, to PCP if we first introduce an intermediate version of 
PCP, which we call the Modified Post’s Correspondence Problem, or MPCP. In 
the modified PCP, there is the additional requirement on a solution that the first 
pair on the A and B lists must be the first pair in the solution. More formally, 
an instance of MPCP is two lists A = w1,wo,...,wg and B = £1, £2,..., Tk, 
and a solution is a list of 0 or more integers 71,%2,...,%m such that 


W1 Wi, Wig Wig = T1 Li, Vig + Cim 


Notice that the pair (w1,21) is forced to be at the beginning of the two 
strings, even though the index 1 is not mentioned at the front of the list that 
is the solution. Also, unlike PCP, where the solution has to have at least one 
integer on the solution list, in MPCP, the empty list could be a solution if 
w = zı (but those instances are rather uninteresting and will not figure in our 
use of MPCP). 


Example 9.15: The lists of Fig. 9.12 may be regarded as an instance of MPCP. 
However, as an instance of MPCP it has no solution. In proof, observe that 
any partial solution has to begin with index 1, so the two strings of a solution 
would begin: 
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The next integer could not be 2 or 3, since both wz and w3 begin with 10 and 
thus would produce a mismatch at the third position. Thus, the next index 
would have to be 1, yielding: 


A: 11- 
B: 111111... 


We can argue this way indefinitely. Only another 1 in the solution can avoid a 
mismatch, but if we can only pick index 1, the B string remains three times as 
long as the A string, and the two strings can never become equal. 


An important step in showing PCP is undecidable is reducing MPCP to 
PCP. Later, we show MPCP is undecidable by reducing Lu to MPCP. At that 
point, we will have a proof that PCP is undecidable as well; if it were decidable, 
then we could decide MPCP, and thus Ly. 

Given an instance of MPCP with alphabet £, we construct an instance of 
PCP as follows. First, we introduce a new symbol x that, in the PCP instance, 
goes between every symbol in the strings of the MPCP instance. However, in 
the strings of the A list, the *’s follow the symbols of X, and in the B list, the 
*’s precede the symbols of ©. The one exception is a new pair that is based on 
the first pair of the MPCP instance; this pair has an extra * at the beginning of 
w 1, so it can be used to start the PCP solution. A final pair ($, *$) is added to 
the PCP instance. This pair serves as the last in a PCP solution that mimics 
a solution to the MPCP instance. 

Now, let us formalize the above construction. We are given an instance of 


MPCP with lists A = wi,we,...,w, and B = z1, £2,...,£k. We assume * 
and $ are symbols not present in the alphabet © of this MPCP instance. We 
construct a PCP instance C = yo,yi,..-,Ye¢41 and D = 20, 21,...,2k41, as 
follows: 


1. For i =1,2,...,k, let y; be w; with a * after each symbol of w;, and let 
zi be x; with a x before each symbol of xi. 


2. yo = *y1, and zo = z1. That is, the Oth pair looks like pair 1, except that 
there is an extra x at the beginning of the string from the first list. Note 
that the Oth pair will be the only pair in the PCP instance where both 
strings begin with the same symbol, so any solution to this PCP instance 
will have to begin with index 0. 


3. Yk+41 = $ and zpqi = *$. 


Example 9.16: Suppose Fig. 9.12 is an MPCP instance. Then the instance 
of PCP constructed by the above steps is shown in Fig. 9.14. 


Theorem 9.17: MPCP reduces to PCP. 
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0 | «lx «l«lx*l 
1] Ix &lxlx1 
2 | 1xOxlelxlx | *1+0 

3 | 1x0x «0 

4 | $ *$ 


Figure 9.14: Constructing an instance of PCP from an MPOP instance 


PROOF: The construction given above is the heart of the proof. First, suppose 
that i1, i2,...,im is a solution to the given MPCP instance with lists A and B. 
Then we know W1Wi Wis t° Wi, = T1Li Lis +++ X;,,. If we were to replace the 
w’s by y’s and the x’s by z’s, we would have two strings that were almost the 
SAME? Y1Yi Yio Yin ANd 212,24 +++ Zim. The difference is that the first string 
would be missing a x at the beginning, and the second would be missing a * at 
the end. That is, 


m* 


KYLY is Yio Yim T 712% Zig + ** Zim * 


However, yo = *y1, and z = z1, so we can fix the initial x by replacing the 
first index by 0. We then have: 


YoYirYiz `` Yim T 20741 Zia ` Zim * 


We can take care of the final x by appending the index k + 1. Since yg41 = $, 
and zķņ}1 = *$, we have: 


YoYir Yia ` * Yim YR+1 5 Z0Zi Zia ` ** Zim Zk+1 


We have thus shown that 0,71,i2,...,im,k + 1 is a solution to the instance of 
PCP. 

Now, we must show the converse, that if the constructed instance of PCP 
has a solution, then the original MPCP instance has a solution as well. We 
observe that a solution to the PCP instance must begin with index 0 and end 
with index k + 1, since only the Oth pair has strings yo and zo that begin with 
the same symbol, and only the (k + 1)st pair has strings that end with the same 
symbol. Thus, the PCP solution can be written 0,71, 72,..-,im,k4+ 1. 

We claim that 71, i2,...,%m is a solution to the MPCP instance. The reason 
is that if we remove the *’s and the final $ from the string yoyi Yio *** Yim Yk+1 
we get the string W1Wi Wis °::w;,,. Also, if we remove the *’s and $ from the 
String 2024, Zin *** Zim Zk+1 We get £1Ti Zis °°: Lin- We know that 


YoU ia Yia `` Yim YR+AL 5 Z0Zi Zia ` ** Fim Zk+1 


so it follows that 


WIWi Wig °° Win 5 T1V i Vig t Lim 
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Thus, a solution to the PCP instance implies a solution to the MPCP instance. 

We now see that the construction described prior to this theorem is an 
algorithm that converts an instance of MPCP with a solution to an instance of 
PCP with a solution, and also converts an instance of MPCP with no solution 
to an instance of PCP with no solution. Thus, there is a reduction of MPCP 
to PCP, which confirms that if PCP were decidable, MPCP would also be 
decidable. 


9.4.3 Completion of the Proof of PCP Undecidability 


We now complete the chain of reductions of Fig. 9.11 by reducing Lu to MPCP. 
That is, given a pair (M,w), we construct an instance (A, B) of MPCP such 
that TM M accepts input w if and only if (A, B) has a solution. 

The essential idea is that MPCP instance (A, B) simulates, in its partial 
solutions, the computation of M on input w. That is, partial solutions will con- 
sist of strings that are prefixes of the sequence of ID’s of M: #a,#a2#a3# +, 
where q is the initial ID of M with input w, and a; F a;41 for all i. The string 
from the B list will always be one ID ahead of the string from the A list, unless 
M enters an accepting state. In that case, there will be pairs to use that will 
allow the A list to “catch up” to the B list and eventually produce a solution. 
However, without entering an accepting state, there is no way that these pairs 
can be used, and no solution exists. 

To simplify the construction of an MPCP instance, we shall invoke Theo- 
rem 8.12, which says that we may assume our TM never prints a blank, and 
never moves left from its initial head position. In that case, an ID of the Turing 
machine will always be a string of the form ag3, where a and 8 are strings of 
nonblank tape symbols, and q is a state. However, we shall allow 8 to be empty 
if the head is at the blank immediately to the right of a, rather than placing a 
blank to the right of the state. Thus, the symbols of a and 8 will correspond 
exactly to the contents of the cells that held the input, plus any cells to the 
right that the head has previously visited. 

Let M = (Q,»,T,6,q0,B,F) be a TM satisfying Theorem 8.12, and let w 
in &* be an input string. We construct an instance of MPCP as follows. To 
understand the motivation behind our choice of pairs, remember that the goal 
is for the first list to be one ID behind the second list, unless M accepts. 


1. The first pair is: 


List A List B 
# #qow# 


This pair, which must start any solution according to the rules of MPCP, 
begins the simulation of M on input w. Notice that initially, the B list is 
a complete ID ahead of the A list. 
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. Tape symbols and the separator # can be appended to both lists. The 


pairs 


List A List B 
X X for each X in T 


# # 


allow symbols not involving the state to be “copied.” In effect, choice of 
these pairs lets us extend the A string to match the B string, and at the 
same time copy parts of the previous ID to the end of the B string. So 
doing helps to form the next ID in the sequence of moves of M, at the 
end of the B string. 


. To simulate a move of M, we have certain pairs that reflect those moves. 


For all q in Q — F (i.e., q is a nonaccepting state), p in Q, and X, Y, and 
Z in T we have: 


List A List B 

qX Yp if ô(q, X) = (p, Y, R) 

Z4X pZY if ô(q, X) = (p, Y, L); Z is any tape symbol 

q# Yp# — if 0( ) 

Zat pZY# if d(q, p, Y, L); Z is any tape symbol 

Like the pairs of (2), these pairs help extend the B string to add the next 
ID, by extending the A string to match the B string. However, these pairs 
use the state to determine the change in the current ID that is needed 
to produce the next ID. These changes — a new state, tape symbol, and 
head move — are reflected in the ID being constructed at the end of the 
B string. 


. If the ID at the end of the B string has an accepting state, then we need 


to allow the partial solution to become a complete solution. We do so by 
extending with “ID’s” that are not really ID’s of M, but represent what 
would happen if the accepting state were allowed to consume all the tape 
symbols to either side of it. Thus, if q is an accepting state, then for all 
tape symbols X and Y, there are pairs: 


List A List B 


Xqy q 
Xq q 
qY q 


. Finally, once the accepting state has consumed all tape symbols, it stands 


alone as the last ID on the B string. That is, the remainder of the two 
strings (the suffix of the B string that must be appended to the A string 
to match the B string) is q#. We use the final pair: 
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List A List B 
GH # 


to complete the solution. 


In what follows, we refer to the five kinds of pairs generated above as the pairs 
from rule (1), rule (2), and so on. 


Example 9.18: Let us convert the TM 


M= ({a, q2, q3}, {0, 1}, {0, 1, B}, ô, qı, B, {1} 


where ô is given by: 


qı (q2, 1, R) (q2, 0, L) (q2,1,L) 
q2 (93,0, L) (q1,0, R) (q2, 0, R) 
93 = Ea — 


and input string w = 01 to an instance of MPCP. To simplify, notice that M 
never writes a blank, so we shall never have B in an ID. Thus, we shall omit 
all the pairs that involve B. The entire list of pairs is in Fig. 9.15, along with 
explanations about where each pair comes from. 

Note that M accepts the input 01 by the sequence of moves 


Let us see the sequence of partial solutions that mimics this computation of M 
and eventually leads to a solution. We must start with the first pair, as required 
in any solution to MPCP: 


A: # 
B: #qu01# 


The only way to extend the partial solution is for the string from the A list 
to be a prefix of the remainder, q,01#. Thus, we must next choose the pair 
(q0, 1q2), which is one of those move-simulating pairs that we got from rule (3). 
The partial solution is thus: 


A: #0 
B: #m01#1q 


We may now further extend the partial solution using the “copying” pairs from 
rule (2), until we get to the state in the second ID. The partial solution is then: 


A: #q,01#1 
B: #n01#lql#l 
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= List A | List B Source 


m,1)= 
m,1)= 
B)= 
(@2,1, ) 
= (q3,0, 
= (q3,0, 
= (q,0, R 
ne (q2,0, R) 


Figure 9.15: MPCP instance constructed from TM M of Example 9.18 


At this point, we can use another of the rule-(3) pairs to simulate a move; the 
appropriate pair is (q21,0q,), and the resulting partial solution is: 


A: #q01#1q1 
B: #q01#1q21#10q 


We now could use rule-(2) pairs to “copy” the next three symbols: #, 1, and 0. 
However, to go that far would be a mistake, since the next move of M moves 
the head left, and the 0 just before the state is needed in the next rule-(3) pair. 
Thus, we only “copy” the next two symbols, leaving partial solution: 


A: #n01#1q21#1 
B: #q01#1q1#10qm #1 


The appropriate rule-(3) pair to use is (0q1#, q201#), which gives us the partial 
solution: 


A: #q,01#1q2.1#10q# 
B: #q,01#1q21#10q1 #1q201# 
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Now, we may use another rule-(3) pair, (1q20, 310), which leads to acceptance: 


A: #¢q101#1q21# 10m #1q20 
B: #q,01#19q21#10q1 #1q2.01#q310 


At this point, we use pairs from rule (4) to eliminate all but q3 from the ID. We 
also need pairs from rule (2) to copy symbols as necessary. The continuation of 
the partial solution is: 


A: #q01#1q21#10q1 #1q201#q3 10149301 #¢31#4 
B: #q,01#1q2.1#10q #1920173 101 #9301 #9314 934% 


With only q3 left in the ID, we can use the pair (qgz##,#) from rule (5) to 
finish the solution: 


A: #q101#1q21410q) #1q201 #93101 #9301 #¢31#a3## 
B: #q101#1q21#10q #1q201#q3101#q301#q31 #03 ## 


Theorem 9.19: Post’s Correspondence Problem is undecidable. 


PROOF: We have almost completed the chain of reductions suggested by Fig. 
9.11. The reduction of MPCP to PCP was shown in Theorem 9.17. The con- 
struction of this section shows how to reduce Ly to MPCP. Thus, we complete 
the proof of undecidability of PCP by proving that the construction is correct, 
that is: 


e M accepts w if and only if the constructed MPCP instance has a solution. 


(Only-if) Example 9.18 gives the fundamental idea. If w is in L(M), then we 
can start with the pair from rule (1), and simulate the computation of M on 
w. We use a pair from rule (3) to copy the state from each ID and simulate 
one move of M, and we use the pairs from rule (2) to copy tape symbols and 
the marker # as needed. If M reaches an accepting state, then the pairs from 
rule (4) and a final use of the pair from rule (5) allow the A string to catch up 
to the B string and form a solution. 


(If) We need to argue that if the MPCP instance has a solution, it could only be 
because M accepts w. First, because we are dealing with MPCP, any solution 
must begin with the first pair, so a partial solution begins 


A: # 
B: #qw# 


As long as there is no accepting state in the partial solution, the pairs from 
rules (4) and (5) are useless. States and one or two of their surrounding tape 
symbols in an ID can only be handled by the pairs of rule (3), and all other 
tape symbols and # must be handled by pairs from rule (2). Thus, unless M 
reaches an accepting state, all partial solutions have the form 


x1 
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A: z 
B: zy 


where x is a sequence of ID’s of M representing a computation of M on input 
w, possibly followed by # and the beginning of the next ID a. The remainder 
y is the completion of a, another #, and the beginning of the ID that follows 
a, up to the point that x ended within a itself. 

In particular, as long as M does not enter an accepting state, the partial 
solution is not a solution; the B string is longer than the A string. Thus, if 
there is a solution, M must at some point enter an accepting state; i.e., M 
accepts w. 


9.4.4 Exercises for Section 9.4 


Exercise 9.4.1: Tell whether each of the following instances of PCP has a 
solution. Each is presented as two lists A and B, and the ith strings on the two 
lists correspond for each i = 1,2,.... 


* a) A = (01,001, 10); B = (011, 10, 00). 
b) A = (01,001, 10); B = (011,01, 00). 
c) A = (ab,a,bc, c); B = (bc, ab, ca, a). 


Exercise 9.4.2: We showed that PCP was undecidable, but we assumed that 
the alphabet © could be arbitrary. Show that PCP is undecidable even if we 
limit the alphabet to © = {0,1} by reducing PCP to this special case of PCP. 


! Exercise 9.4.3: Suppose we limited PCP to a one-symbol alphabet, say © = 


{0}. Would this restricted case of PCP still be undecidable? 


Exercise 9.4.4: A Post tag system consists of a set of pairs of strings chosen 
from some finite alphabet © and a start string. If (w,x) is a pair, and y is 
any string over ©, we say that wy F yx. That is, on one move, we can remove 
some prefix w of the “current” string wy and instead add at the end the second 
component of a string « with which w is paired. Define F to mean zero or 
more steps of F, just as for derivations in a context-free grammar. Show that 
it is undecidable, given a set of pairs P and a start string z, whether z Fe. 
Hint: For each TM M and input w, let z be the initial ID of M with input w, 
followed by a separator symbol #. Select the pairs P such that any ID of M 
must eventually become the ID that follows by one move of M. If M enters an 
accepting state, arrange that the current string can eventually be erased, i.e., 
reduced to €. 


9.5 Other Undecidable Problems 


Now, we shall consider a variety of other problems that we can prove undecid- 
able. The principal technique is reducing PCP to the problem we wish to prove 
undecidable. 
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9.5.1 Problems About Programs 


Our first observation is that we can write a program, in any conventional lan- 
guage, that takes as input an instance of PCP and searches for solutions some 
systematic manner, e.g., in order of the length (number of pairs) of potential 
solutions. Since PCP allows arbitrary alphabets, we should encode the symbols 
of its alphabet in binary or some other fixed alphabet, as discussed in the box 
on “PCP as a Language” in Section 9.4.1. 

We can have our program do any particular thing we want, e.g., halt or 
print hello, world, when and if it finds a solution. Otherwise, the program 
will never perform that particular action. Thus, it is undecidable whether a 
program prints hello, world, whether it halts, whether it calls a particular 
function, rings the console bell, or makes any other nontrivial action. In fact, 
there is an analog of Rice’s Theorem for programs: any nontrivial property that 
involves what the program does (rather than a lexical or syntactic property of 
the program itself) must be undecidable. 


9.5.2 Undecidability of Ambiguity for CFG’s 


Programs are sufficiently like Turing machines that the observations of Sec- 
tion 9.5.1 are unsurprising. Now, we shall see how to reduce PCP to a problem 
that looks nothing like a question about computers: the question of whether a 
given context-free grammar is ambiguous. 

The key idea is to consider strings that represent a list of indexes (integers), 
in reverse, and the corresponding strings according to one of the lists of a 
PCP instance. These strings can be generated by a grammar. The similar set 
of strings for the other list in the PCP instance can also be generated by a 
grammar. If we take the union of these grammars in the obvious way, then 
there is a string generated through the productions of each original grammar if 
and only if there is a solution to this PCP instance. Thus, there is a solution if 
and only if there is ambiguity in the grammar for the union. 

Let us now make these ideas more precise. Let the PCP instance consist of 
lists A = w1, w2,...,Wg and B = 21,%2,..., 2%. For list A we shall construct 
a CFG with A as the only variable. The terminals are all the symbols of the 
alphabet © used for this PCP instance, plus a distinct set of index symbols 
@1,@2,...,@% that represent the choices of pairs of strings in a solution to the 
PCP instance. That is, the index symbol a; represents the choice of w; from 
the A list or z; from the B list. The productions for the CFG for the A list are: 


A > wA | w2Aaz |---| weAa | 
way | waz |---| Wkak 


We shall call this grammar G4 and its language La. In the future, we shall 
refer to a language like L4 as the language for the list A. 

Notice that the terminal strings derived by G4 are all those of the form 
Wi, Win Win Ain ''* Qizli, for some m > 1 and list of integers 71,%2,...,%m;} 
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each integer is in the range 1 to k. The sentential forms of G 4 all have a single 
A between the strings (the w’s) and the index symbols (the a’s), until we use 
one of the last group of k productions, none of which has an A in the body. 
Thus, parse trees look like the one suggested in Fig. 9.16. 


A 
i ee 
w; : a; 

2 i 2 

A 

KIAN a AE 
WwW; a; 
m m 


Figure 9.16: The form of parse trees in the grammar GA 


Observe also that any terminal string derivable from A in Gy has a unique 
derivation. The index symbols at the end of the string determine uniquely 
which production must be used at each step. That is, only two production 
bodies end with a given index symbol a;: A + w;Aa; and A > wja;. We must 
use the first of these if the derivation step is not the last, and we must use the 
second production if it is the last step. 

Now, let us consider the other part of the given PCP instance, the list 


B=4%1,%2,...,2,. For this list we develop another grammar Gp: 
B + «Bay | £2Bay |---| Bar | 
z101 | Loa2 |---| kak 


The language of this grammar will be referred to as Lg. The same observations 
that we made for Ga apply also to Gg. In particular, a terminal string in Dg 
has a unique derivation, which can be determined by the index symbols in the 
tail of the string. 

Finally, we combine the languages and grammars of the two lists to form a 
grammar G 4p for the entire PCP instance. G Apg consists of: 


1. Variables A, B, and S; the latter is the start symbol. 
2. Productions S > A | B. 
3. All the productions of G 4. 


4. All the productions of Gg. 
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We claim that GAs is ambiguous if and only if the instance (A, B) of PCP has 
a solution; that argument is the core of the next theorem. 


Theorem 9.20: It is undecidable whether a CFG is ambiguous. 


PROOF: We have already given most of the reduction of PCP to the question 
of whether a CFG is ambiguous; that reduction proves the problem of CFG 
ambiguity to be undecidable, since PCP is undecidable. We have only to show 
that the above construction is correct; that is: 


e Gap is ambiguous if and only if instance (A, B) of PCP has a solution. 


(If) Suppose 41, %2,...,4m is a solution to this instance of PCP. Consider the 
two derivations in GAB: 


S> A> Wi, Åi; > Wi, Win AGin Gi, za > 
Wi Wiz: TE tt Ain Ai, > Wi Wig 1 * Win, Aim `` * aizi 


S > B> Zi Bai > Lj, Zis Baz, Gi, za > 
TiTi’! Zim 1 Blin 88 Qis Qi, > Liy Lis ** t Vi, Qin, °° Qia Ai, 


Since 71,72,...,4m is a solution, we know that Wi, Wis +++ Win = Li Lig Lin: 
Thus, these two derivations are derivations of the same terminal string. Since 
the derivations themselves are clearly two distinct, leftmost derivations of the 
same terminal string, we conclude that Gp is ambiguous. 


(Only-if) We already observed that a given terminal string cannot have more 
than one derivation in G4 and not more than one in Gg. So the only way that 
a terminal string could have two leftmost derivations in G As is if one of them 
begins S = A and continues with a derivation in G4, while the other begins 
S => B and continues with a derivation of the same string in Gg. 

The string with two derivations has a tail of indexes ai, +++ a;,a;,, for some 
m > 1. This tail must be a solution to the PCP instance, because what pre- 
cedes the tail in the string with two derivations is both w;,wj;,:-+-w;,, and 
Li Zi ti 


m* 


9.5.3 The Complement of a List Language 


Having context-free languages like L4 for the list A lets us show a number of 
problems about CFL’s to be undecidable. More undecidability facts for CFL’s 
can be obtained by considering the complement language Z4. Notice that the 
language Z4 consists of all strings over the alphabet © U {a1,a2,...,a,} that 
are not in LA, where © is the alphabet of some instance of PCP, and the a,’s 
are distinct symbols representing the indexes of pairs in that PCP instance. 
The interesting members of L4 are those strings consisting of a prefix in X* 
that is the concatenation of some strings from the A list, followed by a suffix 
of index symbols that does not match the strings from A. However, there are 
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also many strings in L4 that are simply of the wrong form: they are not in the 
language of regular expression ©* (a1 + a2 + +--+ ak)“. 

We claim that L4 is a CFL. Unlike L4, it is not very easy to design a 
grammar for L4, but we can design a PDA, in fact a deterministic PDA, for 
La. The construction is in the next theorem. 


Theorem 9.21: If L4 is the language for list A, then La is a context-free 
language. 


PROOF: Let © be the alphabet of the strings on list A = w1,wo2,...,wz, and 
let I be the set of index symbols: I = {a1,a2,..., ag}. The DPDA P we design 
to accept La works as follows. 


1. As long as P sees symbols in ©, it stores them on its stack. Since all 
strings in }* are in Ly, P accepts as it goes. 


2. As soon as P sees an index symbol in J, say a;, it pops its stack to see if 


the top symbols form wÈ, that is, the reverse of the corresponding string. 


(a) If not, then the input seen so far, and any continuation of this input 
is in La. Thus, P goes to an accepting state in which it consumes 
all future inputs without changing its stack. 


(b) If w? was popped from the stack, but the bottom-of-stack marker 
is not yet exposed on the stack, then P accepts, but remembers, in 
its state that it is looking for symbols in J only, and may yet see a 
string in La (which P will not accept). P repeats step (2) as long 
as the question of whether the input is in L4 is unresolved. 


(c) If wF was popped from the stack, and the bottom-of-stack marker 
is exposed, then P has seen an input in LA. P does not accept this 
input. However, since any input continuation cannot be in La, P 
goes to a state where it accepts all future inputs, leaving the stack 
unchanged. 


3. If, after seeing one or more symbols of J, P sees another symbol of X, 
then the input is not of the correct form to be in La. Thus, P goes toa 
state in which it accepts this and all future inputs, without changing its 
stack. 


We can use L4, Dg and their complements in various ways to show unde- 
cidability results about context-free languages. The next theorem summarizes 
some of these facts. 


Theorem 9.22: Let G; and G2 be context-free grammars, and let R be a 
regular expression. Then the following are undecidable: 


a) Is L(G1) N L(G2) = 0? 
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b) Is L(G1) = L(G2)? 
c) Is L(G:) = L(R)? 

d) Is L(G1) =T* for some alphabet T? 
e) Is L(G1) C L(G2)? 

f) Is L(R) C L(G)? 


PROOF: Each of the proofs is a reduction from PCP. We show how to take 
an instance (A, B) of PCP and convert it to a question about CFG’s and/or 
regular expressions that has answer “yes” if and only if the instance of PCP 
has a solution. In some cases, we reduce PCP to the question as stated in the 
theorem; in other cases we reduce it to the complement. It doesn’t matter, since 
if we show the complement of a problem to be undecidable, it is not possible 
that the problem itself is decidable, since the recursive languages are closed 
under complementation (Theorem 9.3). 

We shall refer to the alphabet of the strings for this instance as © and the 
alphabet of index symbols as J. Our reductions depend on the fact that LA, 
Lp, La, and Lp all have CFG’s. We construct these CFG’s either directly, as 
in Section 9.5.2, or by the construction of a PDA for the complement languages 
given in Theorem 9.21 coupled with the conversion from a PDA to a CFG by 
Theorem 6.14. 


a) Let L(G,) = La and L(G2) = Lg. Then L(G) N L(G2) is the set of 
solutions to this instance of PCP. The intersection is empty if and only 
if there is no solution. Note that, technically, we have reduced PCP to 
the language of pairs of CFG’s whose intersection is nonempty; i.e., we 
have shown the problem “is the intersection of two CFG’s nonempty” to 
be undecidable. However, as mentioned in the introduction to the proof, 
showing the complement of a problem to be undecidable is tantamount 
to showing the problem itself undecidable. 


b) Since CFG’s are closed under union, we can construct a CFG G, for 
Ta U Lp. Since (© U I)* is a regular set, we surely may construct for it a 
CFG Gz. Now La U Lg = La N Lpg. Thus, L(G) is missing only those 
strings that represent solutions to the instance of PCP. L(G.) is missing 
no strings in (© U J)*. Thus, their languages are equal if and only if the 
PCP instance has no solution. 


c) The argument is the same as for (b), but we let R be the regular expression 
(DU T)*. 


d) The argument of (c) suffices, since © U T is the only alphabet of which 
La U Lpg could possibly be the closure. 
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e) Let G1 be a CFG for (© U J)* and let Gy be a CFG for L4 U Lg. Then 
E(G1) C L(G2) if and only if La U Lg = (© U I)*, i.e., if and only if the 
PCP instance has no solution. 


f) The argument is the same as (e), but let R be the regular expression 
(£ U I)*, and let L(G1) be La U Lp. 


9.5.4 Exercises for Section 9.5 


* Exercise 9.5.1: Let L be the set of (codes for) context-free grammars G such 
that L(G) contains at least one palindrome. Show that L is undecidable. Hint: 
Reduce PCP to L by constructing, from each instance of PCP a grammar whose 
language contains a palindrome if and only if the PCP instance has a solution. 


! Exercise 9.5.2: Show that the language L4 U Lp is a regular language if and 
only if it is the set of all strings over its alphabet; i.e., if and only if the instance 
(A, B) of PCP has no solution. Thus, prove that it is undecidable whether or 
not a CFG generates a regular language. Hint: Suppose there is a solution to 
PCP; say the string wx is missing from La U Lpg, where w is a string from 
the alphabet © of this PCP instance, and g is the reverse of the corresponding 
string of index symbols. Define a homomorphism h(0) = w and h(1) = x. Then 
what is h~!(Z4 U Lg)? Use the fact that regular sets are closed under inverse 
homomorphism, complementation, and the pumping lemma for regular sets to 
show that La U Lpg is not regular. 


'! Exercise 9.5.3: It is undecidable whether the complement of a CFL is also a 
CFL. Exercise 9.5.2 can be used to show it is undecidable whether the comple- 
ment of a CFL is regular, but that is not the same thing. To prove our initial 
claim, we need to define a different language that represents the nonsolutions to 
an instance (A, B) of PCP. Let Laz be the set of strings of the form wHa#-y#z 
such that: 


1. w and gx are strings over the alphabet © of the PCP instance. 
2. y and z are strings over the index alphabet J for this instance. 
3. # is a symbol in neither © nor J. 


4. At least one of the following holds: 


(a) w £ 

(b) y iE oh 

(c) z? is not what the index string y generates according to list B. 

(d) w is not what the index string z? generates according to the list A. 
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Notice that Lap consists of all strings in X*#X*#I*#I* unless the instance 
(A, B) has a solution, but Lap is a CFL regardless. Prove that Lap is a CFL 
if and only if there is no solution. Hint: Use the inverse homomorphism trick 
from Exercise 9.5.2 and use Ogden’s lemma to force equality in the lengths of 
certain substrings as in the hint to Exercise 7.2.5(b). 


9.6 


+ 


Summary of Chapter 9 


Recursive and Recursively Enumerable Languages: The languages ac- 
cepted by Turing machines are called recursively enumerable (RE), and 
the subset of RE languages that are accepted by a TM that always halts 
are called recursive. 


Complements of Recursive and RE Languages: The recursive languages 
are closed under complementation, and if a language and its complement 
are both RE, then both languages are actually recursive. Thus, the com- 
plement of an RE-but-not-recursive language can never be RE. 


Decidability and Undecidability: “Decidable” is a synonym for “recur- 
sive,” although we tend to refer to languages as “recursive” and prob- 
lems (which are languages interpreted as a question) as “decidable.” If 
a language is not recursive, then we call the problem expressed by that 
language “undecidable.” 


The Language La: This language is the set of strings of 0’s and 1’s that, 
when interpreted as a TM, are not in the language of that TM. The 
language Lg is a good example of a language that is not RE; i.e., no 
Turing machine accepts it. 


The Universal Language: The language Lu consists of strings that are 
interpreted as a TM followed by an input for that TM. The string is in 
La if the TM accepts that input. Lu is a good example of a language that 
is RE but not recursive. 


Rice’s Theorem: Any nontrivial property of the languages accepted by 
Turing machines is undecidable. For instance, the set of codes for Turing 
machines whose language is empty is undecidable by Rice’s theorem. In 
fact, this language is not RE, although its complement — the set of codes 
for TM’s that accept at least one string — is RE but not recursive. 


Post’s Correspondence Problem: This question asks, given two lists of the 
same number of strings, whether we can pick a sequence of corresponding 
strings from the two lists and form the same string by concatenation. PCP 
is an important example of an undecidable problem. PCP is a good choice 
for reducing to other problems and thereby proving them undecidable. 
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+ Undecidable Context-Free-Language Problems: By reduction from PCP, 
we can show a number of questions about CFL’s or their grammars to be 
undecidable. For instance, it is undecidable whether a CFG is ambiguous, 
whether one CFL is contained in another, or whether the intersection of 
two CFL’s is empty. 


9.7 Gradiance Problems for Chapter 9 


The following is a sample of problems that are available on-line through the 
Gradiance system at www.gradiance.com/pearson. Each of these problems 
is worked like conventional homework. The Gradiance system gives you four 
choices that sample your knowledge of the solution. If you make the wrong 
choice, you are given a hint or advice and encouraged to try the same problem 
again. 


Problem 9.1: We can represent questions about context-free languages and 
regular languages by choosing a standard encoding for context-free grammars 
(CFG’s) and another for regular expressions (RE’s), and phrasing the question 
as recognition of the codes for grammars and/or regular expressions such that 
their languages have certain properties. Some sets of codes are decidable, while 
others are not. 

In what follows, you may assume that G and H are context-free grammars 
with terminal alphabet {0,1}, and R is a regular expression using symbols 0 
and 1 only. You may assume that the problem “Is L(G) = (0 + 1)*?”, that is, 
the problem of recognizing all and only the codes for CFG’s G whose language 
is all strings of 0’s and 1’s, is undecidable. 

There are certain other problems about CFG’s and RE’s that are decidable, 
using well-known algorithms. For example, we can test if L(G) is empty by 
finding the pumping-lemma constant n for G, and checking whether or not 
there is a string of length n or less in L(G). It is not possible that the shortest 
string in L(G) is longer than n, because the pumping lemma lets us remove at 
least one symbol from a string that long and find a shorter string in L(G). 

You should try to determine which of the following problems are decidable, 
and which are undecidable: 


e Is Comp(L(G)) equal to (0 + 1)*? [Comp(L) is the complement of lan- 
guage L with respect to the alphabet {0,1}.] 


Is Comp(L(G)) empty? 
e Is L(G) intersect L(H) equal to (0 + 1)*? 
Is L(G) union L(H) equal to (0 + 1)*? 
e Is L(G) finite? 
Is L(G) contained in L(H)? 
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e Is L 


e Is L(G) = L(R)? 


e Is L(G) contained in L(R)? 


e Is L(R) contained in L(G)? 


Then, identify the true statement from the list below. 


Problem 9.2: For the purpose of this question, we assume that all languages 
are over input alphabet {0,1}. Also, we assume that a Turing machine can 
have any fixed number of tapes. 

Sometimes restricting what a Turing machine can do does not affect the 
class of languages that can be recognized — the restricted Turing machines 
can still be designed to accept any recursively enumerable language. Other 
restrictions limit what languages the Turing machine can accept. For example, 
it might limit the languages to some subset of the recursive languages, which 
we know is smaller than the recursively enumerable languages. Here are some 
of the possible restrictions: 


e Limit the number of states the TM may have. 

e Limit the number of tape symbols the TM may have. 
e Limit the number of times any tape cell may change. 
e Limit the amount of tape the TM may use. 

e Limit the number of moves the TM may make. 

e Limit the way the tape heads may move. 


Consider the effect of limitations of these types, perhaps in pairs. Then, from 
the list below, identify the combination of restrictions that allows the restricted 
form of Turing machine to accept all recursively enumerable languages. 


Problem 9.3: Which of the following problems about a Turing Machine M 
does Rice’s Theorem imply is undecidable? 


Problem 9.4: Here is an instance of the Modified Post’s Correspondence 
Problem: 


List A List B 


1 | 01 010 
2) 11 110 
3 | 0 01 


If we apply the reduction of MPCP to PCP described in Section 9.4.2, which 
of the following would be a pair in the resulting PCP instance. 


422 CHAPTER 9. UNDECIDABILITY 


Problem 9.5: We wish to perform the reduction of acceptance by a Turing 
machine to MPCP, as described in Section 9.4.3. We assume the TM M satisfies 
Theorem 8.12: it never moves left from its initial position and never writes a 
blank. We know the following: 


1. The start state of M is q. 

2. ris the accepting state of M. 

3. The tape symbols of M are 0, 1, and B (blank). 
4. One of the moves of M is 6(q,0) = (p, 1, L). 


Which of the following is definitely not one of the pairs in the MPCP instance 
that we construct for the TM M and the input 001? 
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Chapter 10 


Intractable Problems 


We now bring our discussion of what can or cannot be computed down to the 
level of efficient versus inefficient computation. We focus on problems that are 
decidable, and ask which of them can be computed by Turing machines that 
run in an amount of time that is polynomial in the size of the input. You should 
review in Section 8.6.3 two important points: 


e The problems solvable in polynomial time on a typical computer are ex- 
actly the same as the problems solvable in polynomial time on a Turing 
machine. 


e Experience has shown that the dividing line between problems that can be 
solved in polynomial time and those that require exponential time or more 
is quite fundamental. Practical problems requiring polynomial time are 
almost always solvable in an amount of time that we can tolerate, while 
those that require exponential time generally cannot be solved except for 
small instances. 


In this chapter we introduce the theory of “intractability,” that is, techniques 
for showing problems not to be solvable in polynomial time. We start with a 
particular problem — the question of whether a boolean expression can be 
satisfied, that is, made true for some assignment of the truth values TRUE and 
FALSE to its variables. This problem plays the role for intractable problems 
that L, or PCP played for undecidable problems. That is, we begin with 
“Cook’s Theorem,” which strongly suggests that the satisfiability of boolean 
formulas cannot be decided in polynomial time. We then show how to reduce 
this problem to many other problems, which are therefore shown intractable as 
well. 

Since we are dealing with whether problems can be solved in polynomial 
time, our notion of a reduction must change. It is no longer sufficient that there 
be an algorithm to transform instances of one problem to instances of another. 
The algorithm itself must take at most polynomial time, or the reduction does 
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not let us conclude that the target problem is intractable, even if the source 
problem is. Thus, we introduce the notion of “polynomial-time reductions” in 
the first section. 

There is another important distinction between the kinds of conclusions we 
drew in the theory of undecidability and those that intractability theory lets 
us draw. The proofs of undecidability that we gave in Chapter 9 are incontro- 
vertible; they depend on nothing but the definition of a Turing machine and 
common mathematics. In contrast, the results on intractable problems that we 
give here are all predicated on an unproved, but strongly believed, assumption, 
often referred to as the assumption P 4 NP. 

That is, we assume the class of problems that can be solved by nondetermin- 
istic TM’s operating in polynomial time includes at least some problems that 
cannot be solved by deterministic TM’s operating in polynomial time (even if 
we allow a higher degree polynomial for the deterministic TM). There are lit- 
erally thousands of problems that appear to be in this category, since they can 
be solved easily by a polynomial time NTM, yet no polynomial-time DTM (or 
computer program, which is the same thing) is known for their solution. More- 
over, an important consequence of intractability theory is that either all these 
problems have polynomial-time deterministic solutions, which have eluded us 
for centuries, or none do; i.e., they really require exponential time. 


10.1 The Classes P and NP 


In this section, we introduce the basic concepts of intractability theory: the 
classes P and NP of problems solvable in polynomial time by deterministic 
and nondeterministic TM’s, respectively, and the technique of polynomial-time 
reduction. We also define the notion of “NP-completeness,” a property that 
certain problems in MP have; they are at least as hard (to within a polynomial 
in time) as any problem in NP. 


10.1.1 Problems Solvable in Polynomial Time 


A Turing machine M is said to be of time complexity T(n) [or to have “running 
time T(n)”] if whenever M is given an input w of length n, M halts after making 
at most T(n) moves, regardless of whether or not M accepts. This definition 
applies to any function T(n), such as T(n) = 50n? or T(n) = 3" + 5n*; we 
shall be interested predominantly in the case where T(n) is a polynomial in n. 
We say a language L is in class P if there is some polynomial T(n) such that 
L = L(M) for some deterministic TM M of time complexity T(n). 


10.1.2 An Example: Kruskal’s Algorithm 


You are probably familiar with many problems that have efficient solutions; 
perhaps you studied some in a course on data structures and algorithms. These 
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Is There Anything Between Polynomials and 
Exponentials? 


In the introductory discussion, and subsequently, we shall often act as if 
all programs either ran in polynomial time [time O(n*) for some integer 
k] or in exponential time [time O(2°") for some constant c > 0], or more. 
In practice, the known algorithms for common problems generally do fall 
into one of these two categories. However, there are running times that lie 
between the polynomials and the exponentials. In all that we say about 
exponentials, we really mean “any running time that is bigger than all the 
polynomials.” 

An example of a function between the polynomials and exponentials 
is n!°82”, This function grows faster than any polynomial in n, since logn 
eventually (for large n) becomes bigger than any constant k. On the other 
hand, n°82” = 20°82”): if you don’t see why, take logarithms of both 
sides. This function grows more slowly than 2°” for any c > 0. That is, no 
matter how small the positive constant c is, eventually cn becomes bigger 
than (logy n)?. 


problems are generally in P. We shall consider one such problem: finding a 
minimum-weight spanning tree (MWST) for a graph. 


Informally, we think of graphs as diagrams such as that of Fig. 10.1. There 
are nodes, which are numbered 1—4 in this example graph, and there are edges 
between some pairs of nodes. Each edge has a weight, which is an integer. A 
spanning tree is a subset of the edges such that all nodes are connected through 
these edges, yet there are no cycles. An example of a spanning tree appears 
in Fig. 10.1; it is the three edges drawn with heavy lines. A minimum-weight 
spanning tree has the least possible total edge weight of all spanning trees. 


12 


OO 


Figure 10.1: A graph; its minimum-weight spanning tree is indicated by heavy 
lines 
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There is a well-known “greedy” algorithm, called Kruskal’s Algorithm,! for 
finding a MWST. Here is an informal outline of the key ideas: 


1. Maintain for each node the connected component in which the node ap- 
pears, using whatever edges of the tree have been selected so far. Initially, 
no edges are selected, so every node is then in a connected component by 
itself. 


2. Consider the lowest-weight edge that has not yet been considered; break 
ties any way you like. If this edge connects two nodes that are currently 
in different connected components then: 


(a) Select that edge for the spanning tree, and 


(b) Merge the two connected components involved, by changing the com- 
ponent number of all nodes in one of the two components to be the 
same as the component number of the other. 


If, on the other hand, the selected edge connects two nodes of the same 
component, then this edge does not belong in the spanning tree; it would 
create a cycle. 


3. Continue considering edges until either all edges have been considered, or 
the number of edges selected for the spanning tree is one less than the 
number of nodes. Note that in the latter case, all nodes must be in one 
connected component, and we can stop considering edges. 


Example 10.1: In the graph of Fig. 10.1, we first consider the edge (1,3), 
because it has the lowest weight, 10. Since 1 and 3 are initially in different 
components, we accept this edge, and make 1 and 3 have the same component 
number, say “component 1.” The next edge in order of weights is (2,3), with 
weight 12. Since 2 and 3 are in different components, we accept this edge and 
merge node 2 into “component 1.” The third edge is (1,2), with weight 15. 
However, 1 and 2 are now in the same component, so we reject this edge and 
proceed to the fourth edge, (3,4). Since 4 is not in “component 1,” we accept 
this edge. Now, we have three edges for the spanning tree of a 4-node graph, 
and so may stop. 


It is possible to implement this algorithm (using a computer, not a Turing 
machine) on a graph with m nodes and e edges in time O(m + eloge). A 
simpler, easier-to-follow implementation proceeds in e rounds. A table gives 
the current component of each node. We pick the lowest-weight remaining edge 
in O(e) time, and find the components of the two nodes connected by the edge 
in O(m) time. If they are in different components, merge all nodes with those 
numbers in O(m) time, by scanning the table of nodes. The total time taken 


1J. B. Kruskal Jr., “On the shortest spanning subtree of a graph and the traveling salesman 
problem,” Proc. AMS 7:1 (1956), pp. 48-50. 
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by this algorithm is O(e(e+m)). This running time is polynomial in the “size” 
of the input, which we might informally take to be the sum of e and m. 

When we translate the above ideas to Turing machines, we face several 
issues: 


e When we study algorithms, we encounter “problems” that ask for outputs 
in a variety of forms, such as the list of edges in a MWST. When we deal 
with Turing machines, we may only think of problems as languages, and 
the only output is yes or no, i.e., accept or reject. For instance, the 
MWST tree problem could be couched as: “given this graph G and limit 
W, does G have a spanning tree of weight W or less?” That problem 
may seem easier to answer than the MWST problem with which we are 
familiar, since we don’t even learn what the spanning tree is. However, 
in the theory of intractability, we generally want to argue that a problem 
is hard, not easy, and the fact that a yes-no version of a problem is 
hard implies that a more standard version, where a full answer must be 
computed, is also hard. 


e While we might think informally of the “size” of a graph as the number 
of its nodes or edges, the input to a TM is a string over a finite alphabet. 
Thus, problem elements such as nodes and edges must be encoded suit- 
ably. The effect of this requirement is that inputs to Turing machines are 
generally slightly longer than the intuitive “size” of the input. However, 
there are two reasons why the difference is not significant: 


1. The difference between the size as a TM input string and as an 
informal problem input is never more than a small factor, usually the 
logarithm of the input size. Thus, what can be done in polynomial 
time using one measure can be done in polynomial time using the 
other measure. 


2. The length of a string representing the input is actually a more ac- 
curate measure of the number of bytes a real computer has to read 
to get its input. For instance, if a node is represented by an integer, 
then the number of bytes needed to represent that integer is propor- 
tional to the logarithm of the integer’s size, and it is not “1 byte for 
any node” as we might imagine in an informal accounting for input 
size. 


Example 10.2: Let us consider a possible code for the graphs and weight lim- 
its that could be the input to the MWST problem. The code has five symbols, 
0, 1, the left and right parentheses, and the comma. 


1. Assign integers 1 through m to the nodes. 


2. Begin the code with the value of m in binary and the weight limit W in 
binary, separated by a comma. 
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3. If there is an edge between nodes i and j with weight w, place (i, j, w) 
in the code. The integers i, j, and w are coded in binary. The order of 
i and j within an edge, and the order of the edges within the code are 
immaterial. 


Thus, one of the possible codes for the graph of Fig. 10.1 with limit W = 40 is 


100, 101000(1, 10, 1111)(1, 11, 1010) (10, 11, 1100) (10, 100, 10100)(11, 100, 10010) 


If we represent inputs to the MWST problem as in Example 10.2, then 
an input of length n can represent at most O(n/logn) edges. It is possible 
that m, the number of nodes, could be exponential in n, if there are very few 
edges. However, unless the number of edges, e, is at least m — 1, the graph 
cannot be connected and therefore will have no MWST, regardless of its edges. 
Consequently, if the number of nodes is not at least some fraction of n/logn, 
there is no need to run Kruskal’s algorithm at all; we simply say “no; there is 
no spanning tree of that weight.” 

Thus, if we have an upper bound on the running time of Kruskal’s algorithm 
as a function of m and e, such as the upper bound O(e(m-+e)) developed above, 
we can conservatively replace both m and e by n and say that the running time, 
as a function of the input length n is O(n(n + n)), or O(n). In fact, a better 
implementation of Kruskal’s algorithm takes time O(n logn), but we need not 
concern ourselves with that improvement here. 

Of course, we are using a Turing machine as our model of computation, while 
the algorithm we described was intended to be implemented in a programming 
language with useful data structures such as arrays and pointers. However, we 
claim that in O(n?) steps we can implement the version of Kruskal’s algorithm 
described above on a multitape TM. The extra tapes are used for several jobs: 


1. One tape can be used to store the nodes and their current component 
numbers. The length of this table is O(n). 


2. A tape can be used, as we scan the edges on the input tape, to hold the 
currently least edge-weight found, among those edges that have not been 
marked “used.” We could use a second track of the input tape to mark 
those edges that were selected as the edge of least remaining weight in 
some previous round of the algorithm. Scanning for the lowest-weight, 
unmarked edge takes O(n) time, since each edge is considered only once, 
and comparisons of weight can be done by a linear, right-to-left scan of 
the binary numbers. 


3. When an edge is selected in a round, place its two nodes on a tape. Search 
the table of nodes and components to find the components of these two 
nodes. This task takes O(n) time. 
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4. A tape can be used to hold the two components, i and j, being merged 
when an edge is found to connect two previously unconnected components. 
We then scan the table of nodes and components, and each node found 
to be in component i has its component number changed to j. This scan 
also takes O(n) time. 


You should thus be able to complete the argument that says one round can 
be executed in O(n) time on a multitape TM. Since the number of rounds, e, 
is at most n, we conclude that O(n”) time suffices on a multitape TM. Now, 
remember Theorem 8.10, which says that whatever a multitape TM can do in 
s steps, a single-tape TM can do in O(s?) steps. Thus, if the multitape TM 
takes O(n”) steps, then we can construct a single-tape TM to do the same thing 
in O((n?)?) = O(n*) steps. Our conclusion is that the yes-no version of the 
MWST problem, “does graph G have a MWST of total weight W or less,” is 
in P. 


10.1.3 Nondeterministic Polynomial Time 


A fundamental class of problems in the study of intractability is those problems 
that can be solved by a nondeterministic TM that runs in polynomial time. 
Formally, we say a language L is in the class VP (nondeterministic polynomial) 
if there is a nondeterministic TM M and a polynomial time complexity T(n) 
such that L = L(M), and when M is given an input of length n, there are no 
sequences of more than T(n) moves of M. 

Our first observation is that, since every deterministic TM is a nondeter- 
ministic TM that happens never to have a choice of moves, P C NP. However, 
it appears that NP contains many problems not in P. The intuitive reason is 
that a NTM running in polynomial time has the ability to guess an exponential 
number of possible solutions to a problem and check each one in polynomial 
time, “in parallel.” However: 


e It is one of the deepest open questions of Mathematics whether P = NP, 
i.e., whether in fact everything that can be done in polynomial time by a 
NTM can in fact be done by a DTM in polynomial time, perhaps with a 
higher-degree polynomial. 


10.1.4 An NP Example: The Traveling Salesman 
Problem 


To get a feel for the power of NP, we shall consider an example of a problem 
that appears to be in NP but not in P: the Traveling Salesman Problem (TSP). 
The input to TSP is the same as to MWST, a graph with integer weights on 
the edges such as that of Fig. 10.1, and a weight limit W. The question asked 
is whether the graph has a “Hamilton circuit” of total weight at most W. A 
Hamilton circuit is a set of edges that connect the nodes into a single cycle, 
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A Variant of Nondeterministic Acceptance 


Notice that we have required of our NTM that it halt in polynomial time 
along all branches, regardless of whether or not it accepts. We could just 
as well have put the polynomial time bound T(n) on only those branches 
that lead to acceptance; i.e., we could have defined MP as those languages 
that are accepted by a NTM such that if it accepts, does so by at least 
one sequence of at most T(n) moves, for some polynomial T (n). 

However, we would get the same class of languages had we done so. 
For if we know that M accepts within T(n) moves if it accepts at all, then 
we could modify M to count up to T(n) on a separate track of its tape and 
halt without accepting if it exceeds count T(n). The modified M might 
take O(T?(n)) steps, but T?(n) is a polynomial if T(n) is. 

In fact, we could also have defined P through acceptance by TM’s 
that accept within time T(n), for some polynomial T(n). These TM’s 
might not halt if they do not accept. However, by the same construction 
as for NTM’s, we could modify the DTM to count to T(n) and halt if the 
limit is exceeded. The DTM would run in O(T?(n)) time. 


with each node appearing exactly once. Note that the number of edges on a 
Hamilton circuit must equal the number of nodes in the graph. 


Example 10.3: The graph of Fig 10.1 actually has only one Hamilton circuit: 
the cycle (1,2,4,3,1). The total weight of this cycle is 15 + 20 + 18 + 10 = 63. 
Thus, if W is 63 or more, the answer is “yes,” and if W < 63 the answer is 
“no.” 

However, the TSP on four-node graphs is deceptively simple, since there 
can never be more than two different Hamilton circuits once we account for the 
different nodes at which the same cycle can start, and for the direction in which 
we traverse the cycle. In m-node graphs, the number of distinct cycles grows 
as O(m!), the factorial of m, which is more than 2°” for any constant c. 


It appears that all ways to solve the TSP involve trying essentially all cycles 
and computing their total weight. By being clever, we can eliminate some 
obviously bad choices. But it seems that no matter what we do, we must 
examine an exponential number of cycles before we can conclude that there is 
none with the desired weight limit W, or to find one if we are unlucky in the 
order in which we consider the cycles. 

On the other hand, if we had a nondeterministic computer, we could guess a 
permutation of the nodes, and compute the total weight for the cycle of nodes in 
that order. If there were a real computer that was nondeterministic, no branch 
would use more than O(n) steps if the input was of length n. On a multitape 
NTM, we can guess a permutation in O(n”) steps and check its total weight in 
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a similar amount of time. Thus, a single-tape NTM can solve the TSP in O(n‘) 
time at most. We conclude that the TSP is in MP. 


10.1.5 Polynomial-Time Reductions 


Our principal methodology for proving that a problem P> cannot be solved in 
polynomial time (i.e., P> is not in P) is the reduction of a problem P4, which is 
known not to be in P, to P).2 The approach was suggested in Fig. 8.7, which 
we reproduce here as Fig. 10.2. 


P —| Construct [> P, Decide yes 
instance instance 


no 


Figure 10.2: Reprise of the picture of a reduction 


Suppose we want to prove the statement “if P> is in P, then so is P,.” Since 
we claim that P; is not in P, we could then claim that Pə is not in P either. 
However, the mere existence of the algorithm labeled “Construct” in Fig. 10.2 
is not sufficient to prove the desired statement. 

For instance, suppose that when given an instance of P) of length m, the 
algorithm produced an output string of length 2”, which it fed to the hypo- 
thetical polynomial-time algorithm for P>. If that decision algorithm ran in, 
say, time O(n*), then on an input of length 2” it would run in time O(2*”), 
which is exponential in m. Thus, the decision algorithm for Pı takes, when 
given an input of length m, time that is exponential in m. These facts are 
entirely consistent with the situation where P is in P and Pı is not in P. 

Even if the algorithm that constructs a Pə instance from a P, instance 
always produces an instance that is polynomial in the size of its input, we can 
fail to reach our desired conclusion. For instance, suppose that the instance of 
P constructed is of the same size, m, as the P, instance, but the construction 
algorithm itself takes time that is exponential in m, say O(2™). Now, a decision 
algorithm for P> that takes polynomial time O(n") on input of length n only 
implies that there is a decision algorithm for P, that takes time O(2™+m*) on 
input of length m. This running time bound takes into account the fact that we 
have to perform the translation to P> as well as solve the resulting P> instance. 
Again it would be possible for P> to be in P and P; not. 

The correct restriction to place on the translation from P, to P» is that it 
requires time that is polynomial in the length of its input. Note that if the 


?That statement is a slight lie. In practice, we only assume Pı is not in P, using the very 
strong evidence that Pı is “NP-complete,” a concept we discuss in Section 10.1.6. We then 
prove that P2 is also “NP-complete,” and thus suggest just as strongly that Pı is not in P. 
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translation takes time O(m/) on input of length m, then the output instance 
of Py cannot be longer than the number of steps taken, i.e., it is at most cm’ 
for some constant c. Now, we can prove that if P> is in P, then so is P}. 

For the proof, suppose that we can decide membership in Pə of a string of 
length n in time O(n*). Then we can decide membership in P, of a string of 
length m in time O (mf + (cm/)*); the term m/ accounts for the time to do the 
translation, and the term (cm’)* accounts for the time to decide the resulting 
instance of P>. Simplifying the expression, we see that P; can be solved in time 
O(mi + em"). Since c, j, and k are all constants, this time is polynomial in 
m, and we conclude P; is in P. 

Thus, in the theory of intractability we shall use polynomial-time reductions 
only. A reduction from P; to P> is polynomial-time if it takes time that is some 
polynomial in the length of the Pı instance. Note that as a consequence, the P» 
instance will be of a length that is polynomial in the length of the P, instance. 


10.1.6 NP-Complete Problems 


We shall next meet the family of problems that are the best-known candidates 
for being in VP but not in P. Let L be a language (problem). We say L is 
NP-complete if the following statements are true about L: 


1. Lisin NP. 


2. For every language L’ in NP there is a polynomial-time reduction of L’ 
to L. 


An example of an NP-complete problem, as we shall see, is the Traveling Sales- 
man Problem, which we introduced in Section 10.1.4. Since it appears that 
P NP, and in particular, all the NP-complete problems are in NP — P, we 
generally view a proof of NP-completeness for a problem as a proof that the 
problem is not in P. 

We shall prove our first problem, called SAT (for boolean satisfiability), to be 
NP-complete by showing that the language of every polynomial-time NTM has 
a polynomial-time reduction to SAT. However, once we have some NP-complete 
problems, we can prove a new problem to be NP-complete by reducing some 
known NP-complete problem to it, using a polynomial-time reduction. The 
following theorem shows why such a reduction proves the target problem to be 
NP-complete. 


Theorem 10.4: If Pi is NP-complete, P> is in MP, and there is a polynomial- 
time reduction of P; to P>, then P> is NP-complete. 


PROOF: We need to show that every language L in NP polynomial-time re- 
duces to Pj. We know that there is a polynomial-time reduction of L to P4; 
this reduction takes some polynomial time p(n). Thus, a string w in L of length 
n is converted to a string x in P, of length at most p(n). 
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NP-Hard Problems 


Some problems L are so hard that although we can prove condition (2) 
of the definition of NP-completeness (every language in MP reduces to 
L in polynomial time), we cannot prove condition (1): that L is in NP. 
If so, we call L NP-hard. We have previously used the informal term 
“intractable” to refer to problems that appeared to require exponential 
time. It is generally acceptable to use “intractable” to mean “NP-hard,” 


although in principle there might be some problems that require exponen- 
tial time even though they are not NP-hard in the formal sense. 

A proof that Lis NP-hard is sufficient to show that L is very likely to 
require exponential time, or worse. However, if L is not in MP, then its 
apparent difficulty does not support the argument that all NP-complete 
problems are difficult. That is, it could turn out that P = NP, and yet L 
still requires exponential time. 


We also know that there is a polynomial-time reduction of P, to Pz; let 
this reduction take polynomial time g(m). Then this reduction transforms x to 
some string y in Pz, taking time at most q(p(n)). Thus, the transformation of 
w to y takes time at most p(n) + q(p(n)), which is a polynomial. We conclude 
that L is polynomial-time reducible to P>. Since L could be any language in 
NP, we have shown that all of NP polynomial-time reduces to Ps; i.e., P> is 
NP-complete. 


There is one more important theorem to be proven about NP-complete 
problems: if any one of them is in P, then all of NP is in P. Since we believe 
strongly that there are many problems in MP that are notin P, we thus consider 
a proof that a problem is NP-complete to be tantamount to a proof that it has 
no polynomial-time algorithm, and thus has no good computer solution. 


Theorem 10.5: If some NP-complete problem P is in P, then P = NP. 


PROOF: Suppose P is both NP-complete and in P. Then all languages L in 
NP reduce in polynomial-time to P. If P is in P, then L is in P, as we discussed 
in Section 10.1.5. 


10.1.7 Exercises for Section 10.1 


Exercise 10.1.1: Suppose we make the following changes to the weights of 
the edges in Fig. 10.1. What would the resulting MWST be? 


* a) Change the weight 10 on edge (1,3) to 25. 


b) Instead, change the weight on edge (2,4) to 16. 
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Other Notions of NP-completeness 


The goal of the study of NP-completeness is really Theorem 10.5, that is, 
the identification of problems P for which their presence in the class P 
implies P = NP. The definition of “NP-complete” we have used, which is 
often called Karp-completeness because it was first used in a fundamental 
paper on the subject by R. Karp, is adequate to capture every problem 
that we have reason to believe satisfies Theorem 10.5. However, there 
are other, broader notions of NP-completeness that also allow us to claim 
Theorem 10.5. 

For instance, S. Cook, in his original paper on the subject, defined a 
problem P to be “NP-complete” if, given an oracle for the problem P, i.e., 
a mechanism that in one unit of time would answer any question about 
membership of a given string in P, it was possible to recognize any lan- 
guage in NP in polynomial time. This type of NP-completeness is called 
Cook-completeness. In a sense, Karp-completeness is the special case where 
you ask only one question of the oracle. However, Cook-completeness also 
allows complementation of the answer; e.g., you might ask the oracle a 
question and then answer the opposite of what the oracle says. A con- 
sequence of Cook’s definition is that the complements of NP-complete 
problems would also be NP-complete. Using the more restricted notion 
of Karp-completeness, as we do, we are able to make an important dis- 
tinction between the NP-complete problems (in the Karp sense) and their 
complements, in Section 11.1. 


Exercise 10.1.2: If we modify the graph of Fig. 10.1 by adding an edge of 
weight 19 between nodes 1 and 4, what is the minimum-weight Hamilton circuit? 


*! Exercise 10.1.3: Suppose that there is an NP-complete problem that has 
a deterministic solution that takes time O(n!°82”). Note that this function 
lies between the polynomials and the exponentials, and is in neither class of 
functions. What could we say about the running time of any problem in NP? 


Figure 10.3: A graph with n = 2; m =3 
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! Exercise 10.1.4: Consider the graphs whose nodes are grid points in an n- 


dimensional cube of side m, that is, the nodes are vectors (#1, 72,...,in), where 
each i; is in the range 1 to m. There is an edge between two nodes if and only 
if they differ by one in exactly one dimension. For instance, the case n = 2 and 
m = 2is a square, n = 3 and m = 2 is a cube, and n = 2 and m = 3 is the graph 
shown in Fig. 10.3. Some of these graphs have a Hamilton circuit, and some do 
not. For instance, the square obviously does, and the cube does too, although it 
may not be obvious; one is (0,0,0), (0,0,1), (0,1,1), (0,1,0), (1,1,0), (1,1,1), 
(1,0, 1), (1,0, 0), and back to (0,0,0). Figure 10.3 has no Hamilton circuit. 

a) Prove that Fig. 10.3 has no Hamilton circuit. Hint: Consider what hap- 
pens when a hypothetical Hamilton circuit passes through the central 
node. Where can it come from, and where can it go to, without cutting 
off one piece of the graph from the Hamilton circuit? 


b) For what values of n and m is there a Hamilton circuit? 


Exercise 10.1.5: Suppose we have an encoding of context-free grammars us- 
ing some finite alphabet. Consider the following two languages: 


1. Lı = {(G, A, B) | G is a (coded) CFG, A and B are (coded) variables of 
G, and the sets of terminal strings derived from A and B are the same}. 


2. Lə = {(Gi, G2) | Gi and Go are (coded) CFG’s, and L(G) = L(G2)}. 
Answer the following: 
* a) Show that Lı is polynomial-time reducible to Lə. 

b) Show that Lə is polynomial-time reducible to L4. 


* e) What do (a) and (b) say about whether or not Lı and Lə are NP- 
complete? 


Exercise 10.1.6: As classes of languages, P and NP each have certain closure 
properties. Show that P is closed under each of the following operations: 


a) Reversal. 
*b 


*l c 


Union. 
Concatenation. 


! d) Closure (star). 


) 
) 
) 
) 


e) Inverse homomorphism. 


* f) Complementation. 


Exercise 10.1.7: MP is also closed under each of the operations listed for P 
in Exercise 10.1.6, with the (presumed) exception of (f) complementation. It is 
not known whether or not NP is closed under complementation, an issue we 
discuss further in Section 11.1. Prove that each of Exercise 10.1.6(a) through 
(e) holds for NP. 
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10.2 An NP-Complete Problem 


We now introduce you to the first NP-complete problem. This problem — 
whether a boolean expression is satisfiable — is proved NP-complete by explic- 
itly reducing the language of any nondeterministic, polynomial-time TM to the 
satisfiability problem. 


10.2.1 The Satisfiability Problem 


The boolean expressions are built from: 


1. Variables whose values are boolean; i.e., they either have the value 1 (true) 
or 0 (false). 


2. Binary operators A and V, standing for the logical AND and OR of two 
expressions. 


3. Unary operator ~ standing for logical negation. 


4. Parentheses to group operators and operands, if necessary to alter the 
default precedence of operators: ~ highest, then A, and finally V. 


Example 10.6: An example of a boolean expression is x A a(y V z). The 
subexpression y V z is true whenever either variable y or variable z has the 
value true, but the subexpression is false whenever both y and z are false. The 
larger subexpression —(y V z) is true exactly when y V z is false, that is, when 
both y and z are false. If either y or z or both are true, then ~(y V z) is false. 

Finally, consider the entire expression. Since it is the logical AND of two 
subexpressions, it is true exactly when both subexpressions are true. That is, 
x \ 7(y V z) is true exactly when z is true, y is false, and z is false. 


A truth assignment for a given boolean expression F assigns either true or 
false to each of the variables mentioned in E. The value of expression E given 
a truth assignment T, denoted E(T), is the result of evaluating E with each 
variable x replaced by the value T(x) (true or false) that T assigns to x. 

A truth assignment T satisfies boolean expression E if E(T) = 1; i.e., the 
truth assignment T makes expression E true. A boolean expression F is said 
to be satisfiable if there exists at least one truth assignment T that satisfies E. 


Example 10.7: The expression z A 7=(y V z) of Example 10.6 is satisfiable. 
We saw that the truth assignment T defined by T(x) = 1, T(y) = 0, and 
T(z) = 0 satisfies this expression, because it makes the value of the expression 
true (1). We also observed that T is the only satisfying assignment for this 
expression, since the other seven combinations of values for the three variables 
give the expression the value false (0). 

For another example, consider the expression E = x A (ax V y) A my. We 
claim that E is not satisfiable. Since there are only two variables, the number 


10.2. AN NP-COMPLETE PROBLEM 439 


of truth assignments is 2? = 4, so it is easy for you to try all four assignments 
and verify that E has value 0 for all of them. However, we can also argue as 
follows. E is true only if all three terms connected by A are true. That means 
x must be true (because of the first term) and y must be false (because of the 
last term). But under that truth assignment, the middle term 72 V y is false. 
Thus, Æ cannot be made true and is in fact unsatisfiable. 

We have seen an example where an expression has exactly one satisfying 
assignment and an example where it has none. There are also many examples 
where an expression has more than one satisfying assignment. For a simple 
example, consider F = x V ~y. The value of F is 1 for three assignments: 


F has value 0 only for the fourth assignment, where x = 0 and y = 1. Thus, F 
is satisfiable. 


The satisfiability problem is: 
e Given a boolean expression, is it satisfiable? 


We shall generally refer to the satisfiability problem as SAT. Stated as a lan- 
guage, the problem SAT is the set of (coded) boolean expressions that are 
satisfiable. Strings that either are not valid codes for a boolean expression or 
that are codes for an unsatisfiable boolean expression are not in SAT. 


10.2.2 Representing SAT Instances 


The symbols in a boolean expression are A, V, 7, the left and right parentheses, 
and symbols representing variables. The satisfiability of an expression does 
not depend on the names of the variables, only on whether two occurrences of 
variables are the same variable or different variables. Thus, we may assume 
that the variables are z1, 72,... , although in examples we shall continue to use 
variable names like y or z, as well as x’s. We shall also assume that variables are 
renamed so we use the lowest possible subscripts for the variables. For instance, 
we would not use x5 unless we also used x, through z4 in the same expression. 
Since there are an infinite number of symbols that could in principle ap- 
pear in a boolean expression, we have a familiar problem of having to devise a 
code with a fixed, finite alphabet to represent expressions with arbitrarily large 
numbers of variables. Only then can we talk about SAT as a “problem,” that 
is, as a language over a fixed alphabet consisting of the codes for those boolean 
expressions that are satisfiable. The code we shall use is as follows: 


1. The symbols A, V, =, (, and ) are represented by themselves. 
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2. The variable x; is represented by the symbol x followed by 0’s and 1’s 
that represent i in binary. 


Thus, the alphabet for the SAT problem/language has only eight symbols. All 
instances of SAT are strings in this fixed, finite alphabet. 


Example 10.8: Consider the expression A =(y V z) from Example 10.6. Our 
first step in coding it is to replace the variables by subscripted x’s. Since there 
are three variables, we must use x1, £2, and 73. We have freedom regarding 
which of x, y, and z is replaced by each of the x;’s, and to be specific, let x = x1, 
y = z2, and z = x3. Then the expression becomes xı A 7(x2 V x3). The code 
for this expression is: 


x1 A 7(210 V 211) 


Notice that the length of a coded boolean expression is approximately the 
same as the number of positions in the expression, counting each variable oc- 
currence as 1. The reason for the difference is that if the expression has m 
positions, it can have O(m) variables, so variables may take O(log m) symbols 
to code. Thus, an expression whose length is m positions can have a code as 
long as n = O(m log m) symbols. 

However, the difference between m and mlog m is surely limited by a poly- 
nomial. Thus, as long as we only deal with the issue of whether or not a problem 
can be solved in time that is polynomial in its input length, there is no need 
to distinguish between the length of an expression’s code and the number of 
positions in the expression itself. 


10.2.3 NP-Completeness of the SAT Problem 


We now prove “Cook’s Theorem,” the fact that SAT is NP-complete. To prove 
a problem is NP-complete, we need first to show that it is in MP. Then, we 
must show that every language in MP reduces to the problem in question. In 
general, we show the second part by offering a polynomial-time reduction from 
some other NP-complete problem, and then invoking Theorem 10.5. But right 
now, we don’t know any NP-complete problems to reduce to SAT. Thus, the 
only strategy available is to reduce absolutely every problem in NP to SAT. 


Theorem 10.9: (Cook’s Theorem) SAT is NP-complete. 


PROOF: The first part of the proof is showing that SAT is in MP. This part 
is easy: 


1. Use the nondeterministic ability of an NTM to guess a truth assignment 
T for the given expression E. If the encoded E is of length n, then O(n) 
time suffices on a multitape NTM. Note that this NTM has many choices 
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of move, and may have as many as 2” different ID’s reached at the end of 
the guessing process, where each branch represents the guess of a different 
truth assignment. 


2. Evaluate E for the truth assignment T. If E(T) = 1, then accept. Note 
that this part is deterministic. The fact that other branches of the NTM 
may not lead to acceptance has no bearing on the outcome, since if even 
one satisfying truth assignment is found, the NTM accepts. 


The evaluation can be done easily in O(n”) time on a multitape NTM. Thus, the 
entire recognition of SAT by the multitape NTM takes O(n”) time. Converting 
to a single-tape NTM may square the amount of time, so O(n*) time suffices 
on a single-tape NTM. 

Now, we must prove the hard part: that if L is any language in VP, then 
there is a polynomial-time reduction of L to SAT. We may assume that there 
is some single-tape NTM M and a polynomial p(n) such that M takes no 
more than p(n) steps on an input of length n, along any branch. Further, the 
restrictions of Theorem 8.12, which we proved for DTM’s, can be proved in the 
same way for NTM’s. Thus, we may assume that M never writes a blank, and 
never moves its head left of its initial head position. 

Thus, if M accepts an input w, and |w| = n, then there is a sequence of 
moves of M such that: 


e 


Qo is the initial ID of M with input w. 
ao Fay +++ ag, where k < p(n). 


az is an ID with an accepting state. 


Pee oS 


Each a; consists of nonblanks only (except if a; ends in a state and a 
blank), and extends from the initial head position — the leftmost input 
symbol — to the right. 


Our strategy can be summarized as follows. 


a) Each a; can be written as a sequence of symbols Xio Xi +++ Xj,p(n)- One 
of these symbols is a state, and the others are tape symbols. As always, 
we assume that the states and tape symbols are disjoint, so we can tell 
which X;; is the state, and therefore tell where the tape head is. Note 
that there is no reason to represent symbols to the right of the first p(n) 
symbols on the tape [which with the state makes an ID of length p(n) + 1], 
because they cannot influence a move of M if M is guaranteed to halt 
after p(n) moves or less. 


b) To describe the sequence of ID’s in terms of boolean variables, we create 
variable y;;4 to represent the proposition that X;; = A. Here, i and j are 
each integers in the range 0 to p(n), and A is either a tape symbol or a 
state. 
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c) We express the condition that the sequence of ID’s represents acceptance 
of an input w by writing a boolean expression that is satisfiable if and 
only if M accepts w by a sequence of at most p(n) moves. The satisfying 
assignment will be the one that “tells the truth” about the ID’s; that is, 
yijA Will be true if and only if X;; = A. To make sure that the polynomial- 
time reduction of L(M) to SAT is correct, we write this expression so that 
it says the computation: 


i. Starts right. That is, the initial ID is qow followed by blanks. 


it. Next move is right (i.e., the move correctly follows the rules of the 
TM). That is, each subsequent ID follows from the previous by one 
of the possible legal moves of M. 


iii. Finishes right. That is, there is some ID that is an accepting state. 


There are a few details that must be introduced before we can make the 
construction of our boolean expression precise. 


e First, we have specified ID’s to end when the infinite tail of blanks begin. 
However, it is more convenient when simulating a polynomial-time com- 
putation to think of all ID’s as having the same length, p(n) + 1. Thus, 
a tail of blanks may be present in an ID. 


e Second, it is convenient to assume that all computations continue for 
exactly p(n) moves [and therefore have p(n) + 1 ID’s], even if acceptance 
occurs earlier. We therefore allow each ID with an accepting state to be 
its own successor. That is, if a has an accepting state, we allow a “move” 
ata. Thus, we can assume that if there is an accepting computation, 
then apn) will have an accepting ID, and that is all we have to check for 
the condition “finishes right.” 


Figure 10.4 suggests what a polynomial-time computation of M looks like. The 
rows correspond to the sequence of ID’s, and the columns are the cells of the 
tape that can be used in the computation. Notice that the number of squares 
in Fig. 10.4 is (p(n) + 1)”. Also, the number of variables that represent each 
square is finite, depending only on M; it is the sum of the number of states and 
tape symbols of M. 

Let us now give an algorithm to construct from M and w a boolean expres- 
sion Em,w. The overall form of Ex, is U ASAN A F, where S, N, and F 
are expressions that say M starts, moves, and finishes right, and U says there 
is a unique symbol in each cell. 


Unique 


U is the logical AND of all terms of the form ~(yija A yijg), where a # 3. Note 
the number of these terms is O (p° (n)). 


10.2. AN NP-COMPLETE PROBLEM 443 


ID 0 1 he oe p(n) 


Figure 10.4: Constructing the array of cell/ID facts 


Starts Right 


Xoo must be the start state qo of M, Xo. through Xon must be w (where n 
is the length of w), and the remaining Xo; must be the blank, B. That is, if 
w = a1a2 an, then: 


S = yoogs N Yotar N Yo2az N***A Yonan A Yo,n+1,B ^ Yo,n+2,B \+** A Yo,p(n),B 


Surely, given the encoding of M and given w, we can write S in O(p(n)) time 
on a second tape of a multitape TM. 


Finishes Right 


Since we assume that an accepting ID repeats forever, acceptance by M is the 
same as finding an accepting state in a,(,). Remember that we assume M is 
an NTM that, if it accepts, does so within p(n) steps. Thus, F is the OR of 
expressions Fj, for j = 0,1,...,p(n), where Fj says that Xp(n),j is an accepting 
state. That is, Fj is Yp(n),j,a1 V Yp(n),j,a2 V ° V Yp(n),j,an> Where a1, a2, ..., ak 
are all the accepting states of M. Then, F = Fo V Fi V +++ V Fyyn)- 

Each F; uses a constant number of symbols, depending on M but not on 
the length n of its input w. Thus, F has length O(n). More importantly, the 
time to write F, given an encoding of M and the input w is polynomial in n; 
actually, F can be written in O(p(n)) time on a multitape TM. 
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Next Move is Right 


Assuring that the moves of M are correct is by far the most complicated part. 
The expression N will be the AND of expressions N;, for i = 0,1,..., p(n) —1, 
and each N; will be designed to assure that ID a;41 is one of the ID’s that 
M allows to follow a;. To begin the explanation of how to write N;, observe 
symbol X;41,; in Fig. 10.4. We can always determine X;+1,; from: 


1. The three symbols above it: X;,;-1, Xij, and Xi j+1, and 


2. If one of these symbols is the state of a;, then the particular choice of 
move by the NTM M. 


We shall write N; as the A of expressions A;; V Bij, where j = 0,1,...,p(n). 


e Expression Aj; says that: 


a) The state of a; is at position j (i.e., Xj; is the state), and 


b) There is a choice of move of M, where X;j is the state and X; j+1 is 
the symbol scanned, such that this move transforms the sequence of 
symbols Ni GANALI into Xi H1,j 1X; } 13A; H1,j+1- Note that if 
Xij is an accepting state, there is the “choice” of making no move 
at all, so all subsequent ID’s are the same as the one that first led 
to acceptance. 


e Expression B;; says that: 


a) The state of a; is not at position j (i.e., Xj; is not a state, and 
b) If the state of a; is not adjacent to position j (i.e., X;,;-1 and Xi j+1 
are not states either), then Xj41,; = Xiz. 


Note that when the state is adjacent to position 7, then the correctness 
of position j will be taken care of by A; j—1 or Ai j+1- 


By is the easier to write. Let q1,q2,..-,4m be the states of M, and let 
41, 22,..-,Z, be the tape symbols. Then: 


Bij = (Yij-1,q V Yij-1,g2 V V Vigal and V 
(Yi j+, V Yij+l,g Vt V Yij+,gm) V 
(Wijz V Yi,j,Za V V Yij, Zn) A 


((Yij,Zı A Yi+1,,Z1) V Wij, Za A Yit, Z2) Veo V (Yaj, Zn A yit1i2Z0))) 


The first two lines of Bj; guarantee that B;; is true whenever the state of a; is 
adjacent to position j. The first three lines together guarantee that if the state 
of a; is at position j, then B,; is false, and the truth of N; depends solely on 
Aj; being true; i.e., on the move being legal. And when the state is at least two 
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positions away from position 7, the last two lines assure that the symbol must 
not change. Note the final line says that Xj; = Xj41,; by saying that either 
both are Z1, or both are Z2, and so on. 

There are two important special cases: either j = 0 or j = p(n). In one case 
there are no variables y; j—1,x, and in the other, no variables y; j+1,x. However, 
we know the head never moves to the left of its initial position, and we know it 
will not have time to get more than p(n) cells to the right of where it started. 
Thus, we may eliminate certain terms from Bio and B; p(n); we leave you to 
make the simplification. 

Now, let us consider the expressions A;;. These expressions reflect all possi- 
ble relationships among the 2 x 3 rectangle of symbols in the array of Fig. 10.4: 
Arsis Kija Kig F13 Xi t1,j—1> Xi bjs and Xi41j41- An assignment of symbols 
to each of these six variables is valid if: 


1. Xj; is a state, but X; j—ı1 and X;,;41 are tape symbols. 


2. There is a move of M that explains how X;,;~-1Xj;Xi,j41 becomes 


Xi41,j-1X641,5 Xi41,j41 


There are thus a finite number of assignments of symbols to the six variables 
that are valid. Let A;; be the OR of terms, one term for each set of six variables 
that form a valid assignment. 

For instance, suppose that one move of M comes from the fact that 6(q, A) 
contains (p, C, L). Let D be some tape symbol of M. Then one valid assignment 
is X41 Xi Xi 541 = DqA and Kiti 1Xi41,5 Xs H1,j+1 = pDC. Notice how 
this assignment reflects the change in ID that is caused by making this move of 
M. The term that reflects this possibility is 


Yij-1.D N Yigg N Yij+1,A A Yi+i,j=1,p A Yi+1,j,D A Yi+1,j+1,C 


If, instead, ô(q, A) contains (p, C, R) (i.e., the move is the same, but the head 
moves right), then the corresponding valid assignment is X; j-1 Xi; Xi j+ = 
DqA and Xi41,j—1Xi+1,;Xi+1,;+1 = DCp. The term for this assignment is 


Yij-1.D A Yigg N Yij+1,A A Yit1j-1,D A Yi+1,j,C A Yi4Lj+1,p 


Aj; is the OR of all valid terms. In the special cases j = 0 and j = p(n), 
we must make certain modifications to reflect the nonexistence of the variables 
yijz for j < 0 or j > p(n), as we did for B;j. Finally, 

N; = (Aio V Bio) A (Aa V Ba) A+++ A (Ai p(n) V Bipin)) 


and then 
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N=NoA Ni At A Ngn- 


Although A;; and B;; can be very large if M has many states and/or tape 
symbols, their size is actually a constant as far as the length of input w is 
concerned; that is, their size is independent of n, the length of w. Thus, the 
length of N; is O(p(n)), and the length of N is O(p?(n)). More importantly, 
we can write N on a tape of a multitape TM in an amount of time that is 
proportional to its length, and that amount of time is polynomial in n, the 
length of w. 


Conclusion of the Proof of Cook’s Theorem 


Although we have described the construction of the expression 
Emw=UNSANAF 


as a function of both M and w, observe that it is only the “starts right” part 
S that depends on w, and it does so in a simple way (w is on the tape of the 
initial ID). The other parts, N and F, depend on M and on n, the length of w, 
only. 

Thus, for any NTM M that runs in some polynomial time p(n), we can 
devise an algorithm that takes an input w of length n, and produces Em,w. The 
running time of this algorithm on a multitape, deterministic TM is O(p?(n)), 
and that multitape TM can be converted to a single-tape TM that runs in time 
O(p'(n)). The output of this algorithm is a boolean expression Ey,» that is 
satisfiable if and only if M accepts w within p(n) moves. 


To emphasize the importance of Cook’s Theorem 10.9, let us see how The- 
orem 10.5 applies to it. Suppose SAT had a deterministic TM that recognized 
its instances in polynomial time, say time q(n). Then every language accepted 
by an NTM M that accepted within polynomial time p(n) would be accepted 
in deterministic polynomial time by the DTM whose operation is suggested by 
Fig. 10.5. The input w to M is converted to a boolean expression Em,w. This 
expression is fed to the SAT tester, and whatever this tester answers about 
Em,w, our algorithm answers about w. 


Polynomial- 
time SAT 
w converter l maw decider yes 
for M 


no 


Figure 10.5: If SAT is in P, then every language in MP could be shown to be 
in P by a DTM designed in this manner 
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10.2.4 Exercises for Section 10.2 


Exercise 10.2.1: How many satisfying truth assignments do the following 
boolean expressions have? Which are in SAT? 


* a) aA (yV am) A (2 V 7y). 
b) (x Vy) A (Ale V 2) V (nz A ~y)). 


Exercise 10.2.2: Suppose G is a graph of four nodes: 1, 2, 3, and 4. Let 
Zij, for 1 <i < j < 4 be a propositional variable that we interpret as saying 
“there is an edge between nodes 7 and j.” Any graph on these four nodes can 
be represented by a truth assignment. For instance, the graph of Fig. 10.1 
is represented by making x ,,4 false and the other five variables true. For any 
property of the graph that involves only the existence or nonexistence of edges, 
we can express that property as a boolean expression that is true if and only if 
the truth assignment to the variables describes a graph that has the property. 
Write expressions for the following properties: 


* a) G has a Hamilton circuit. 
b) G is connected. 


c) G contains a clique of size 3, that is, a set of three nodes such that there 
is an edge between every two of them (i.e., a triangle in the graph). 


d) G contains at least one isolated node, that is, a node with no edges. 


10.3 A Restricted Satisfiability Problem 


Our plan is to demonstrate a wide variety of problems, such as the TSP problem 
mentioned in Section 10.1.4, to be NP-complete. In principle, we do so by 
finding polynomial-time reductions from the problem SAT to each problem of 
interest. However, there is an important intermediate problem, called “3SAT,” 
that is much easier than SAT to reduce to typical problems. 3SAT is still a 
problem about satisfiability of boolean expressions, but these expressions have 
a very regular form: they are the AND of “clauses,” each of which is the OR 
of exactly three variables or negated variables. 

In this section we introduce some important terminology about boolean 
expressions. We then reduce satisfiability for any expression to satisfiability 
for expressions in the normal form for the 3SAT problem. It is interesting to 
observe that, while every boolean expression E has an equivalent expression F 
in the normal form of 3SAT, the size of F may be exponential in the size of 
E. Thus, our polynomial-time reduction of SAT to 3SAT must be more subtle 
than simple boolean-algebra manipulation. We need to convert each expression 
E in SAT to another expression F in the normal form for 3SAT. Yet F is not 
necessarily equivalent to E. We can be sure only that F is satisfiable if and 
only if E is. 
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10.3.1 Normal Forms for Boolean Expressions 


The following are three essential definitions: 


e A literal is either a variable, or a negated variable. Examples are x and 
~y. To save space, we shall often use an overbar 7 in place of a literal 
such as 7y. 


e A clause is the logical OR of one or more literals. Examples are x, x V y, 
andxV WV z. 


e A boolean expression is said to be in conjunctive normal form? or CNF, 
if it is the AND of clauses. 


To further compress the expressions we write, we shall adopt the alternative 
notation in which V is treated as a sum, using the + operator, and A is treated 
as a product. For products, we normally use juxtaposition, i.e., no operator, 
just as we do for concatenation in regular expressions. It is also then natural 
to refer to a clause as a “sum of literals” and a CNF expression as a “product 
of clauses.” 


Example 10.10: The expression (æ V ay) A (~x V z) will be written in our 
compressed notation as (x + ¥)(£ + z). It is in conjunctive normal form, since 
it is the AND (product) of the clauses (x + y) and (T7 + 2). 

Expression (x + yZ)(@+y+2)(¥+2) is not in CNF. It is the AND of three 
subexpressions, (x + yZ), (x +y + z), and (y +Z). The last two are clauses, but 
the first is not; it is the sum of a literal and a product of two literals. 

Expression xyz is in CNF. Remember that a clause can have only one literal. 
Thus, our expression is the product of three clauses, (x), (y), and (z). 


An expression is said to be in k-conjunctive normal form (k-CNF) if it is 
the product of clauses, each of which is the sum of exactly k distinct literals. 
For instance, (2 + ¥)(y + Z)(z +T) is in 2-CNF, because each of its clauses has 
exactly two literals. 

All of these restrictions on boolean expressions give rise to their own prob- 
lems about satisfiability for expressions that meet the restriction. Thus, we 
shall speak of the following problems: 


e CSAT is the problem: given a boolean expression in CNF, is it satisfiable? 


e kSAT is the problem: given a boolean expression in k-CNF, is it satisfi- 
able? 


We shall see that CSAT, 3SAT, and kSAT for all k higher than 3 are NP- 
complete. However, there are linear-time algorithms for ISAT and 2SAT. 


3“Conjunction” is a fancy term for logical AND. 
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Handling Bad Input 


Each of the problems we have discussed — SAT, CSAT, 3SAT, and so 
on — are languages over a fixed, 8-symbol alphabet, whose strings we 
sometimes may interpret as boolean expressions. A string that is not 
interpretable as an expression cannot be in the language SAT. Likewise, 
when we consider expressions of restricted form, a string that is a well- 
formed boolean expression, but not an expression of the required form, 
is never in the language. Thus, an algorithm that decides the CSAT 
problem, for example, will say “no” if it is given a boolean expression that 
is satisfiable, but not in CNF. 


10.3.2 Converting Expressions to CNF 


Two boolean expressions are said to be equivalent if they have the same result 
on any truth assignment to their variables. If two expressions are equivalent, 
then surely either both are satisfiable or neither is. Thus, converting arbitrary 
expressions to equivalent CNF expressions is a promising approach to devel- 
oping a polynomial-time reduction from SAT to CSAT. That reduction would 
show CSAT to be NP-complete. 

However, things are not quite so simple. While we can convert any expres- 
sion to CNF, the conversion can take more than polynomial time. In particular, 
it may exponentiate the length of the expression, and thus surely take expo- 
nential time to generate the output. 

Fortunately, conversion of an arbitrary boolean expression to an expression 
in CNF is only one way that we might reduce SAT to CSAT, and thus prove 
CSAT is NP-complete. All we have to do is take a SAT instance E and convert 
it to a CSAT instance F such that F is satisfiable if and only if E is. It is not 
necessary that E and F be equivalent. It is not even necessary for E and F to 
have the same set of variables, and in fact, generally F will have a superset of 
the variables of E. 

The reduction of SAT to CSAT will consist of two parts. First, we push all 
-’s down the expression tree so that the only negations are of variables; i.e., the 
boolean expression becomes an AND and OR of literals. This transformation 
produces an equivalent expression and takes time that is at most quadratic 
in the size of the expression. On a conventional computer, with a carefully 
designed data structure, it takes only linear time. 

The second step is to write an expression that is the AND and OR of literals 
as a product of clauses; i.e., to put it in CNF. By introducing new variables, 
we are able to perform this transformation in time that is a polynomial in the 
size of the given expression. The new expression F will not be equivalent to 
the old expression FE, in general. However, F will be satisfiable if and only if E 
is. More specifically, if T is a truth assignment that makes E true, then there 
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Expression Rule 


Figure 10.6: Pushing —’s down the expression tree so they appear only in literals 


is an extension of T, say S, that makes F true; we say S is an extension of T if 
S assigns the same value as T to each variable that T assigns, but S may also 
assign a value to variables that T does not mention. 

Our first step is to push —-’s below A’s and V’s. The rules we need are: 


1. “(FE A F) > 7-(E£) V -(F). This rule, one of DeMorgan’s laws, allows us 
to push ~ below A. Note that as a side-effect, the A is changed to an V. 


2. ~a(E V F) => -(E) A -7(F). The other “DeMorgan’s law” pushes ~ below 
vV. The V is changed to A as a side-effect. 


3. a(-(E)) = E. This law of double negation cancels a pair of =’s that 
apply to the same expression. 


Example 10.11: Consider the expression E = ~ (ec +y)) (T+ y)): Notice 


that we have used a mixture of our two notations, with the — operator used 
explicitly when the expression to be negated is more than a single variable. 
Figure 10.6 shows the steps in which expression F has all its —°’s pushed down 
until they become parts of literals. 

The final expression is equivalent to the original and is an OR-and-AND 
expression of literals. It may be further simplified to the expression x + y, but 
that simplification is not essential to our claim that every expression can be 
rewritten so the =’s appear only in literals. 


Theorem 10.12: Every boolean expression E is equivalent to an expression 
F in which the only negations occur in literals; i.e., they apply directly to 
variables. Moreover, the length of F is linear in the number of symbols of E, 
and F can be constructed from E in polynomial time. 


PROOF: The proof is an induction on the number of operators (A, V, and 
=) in E. We show that there is an equivalent expression F with —’s only in 
literals. Additionally, if Æ has n > 1 operators, then F has no more than 2n — 1 
operators. 

Since F need not have more than one pair of parentheses per operator, and 
the number of variables in an expression cannot exceed the number of operators 
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by more than one, we conclude that the length of F is linearly proportional to 
the length of E. More importantly, we shall see that, because the construction 
of F is quite simple, the time it takes to construct F is proportional to its 
length, and therefore proportional to the length of E. 


BASIS: If E has one operator, it must be of the form ng, x V y, or x A y, for 
variables x and y. In each case, Æ is already in the required form, so F = E 
serves. Note that since E and F each have one operator, the relationship “F 
has at most twice the number of operators of E, minus 1” holds. 


INDUCTION: Suppose the statement is true for all expressions with fewer op- 
erators than Æ. If the highest operator of E is not ~, then E must be of the 
form FE, V Ey or E A Ey. In either case, the inductive hypothesis applies to 
FE, and Ey; it says that there are equivalent expressions F and F», respectively, 
in which all —’s occur in literals only. Then F = Fy V Fy or F = (Fi) A (Fo) 
serves as a suitable equivalent for Æ. Let FE, and Ey have a and b operators, 
respectively. Then Æ has a +b + 1 operators. By the inductive hypothesis, Fı 
and Fə have at most 2a — 1 and 2b — 1 operators, respectively. Thus, F has at 
most 2a + 2b — 1 operators, which is no more than 2(a+ b+ 1) — 1, or twice the 
number of operators of E, minus 1. 

Now, consider the case where EF is of the form ~F. There are three cases, 
depending on what the top operator of FE, is. Note that E, must have an 
operator, or E is really a basis case. 


1. BE, = nE. Then by the law of double negation, E = =(4E») is equivalent 
to Ey. Since E has fewer operators than E, the inductive hypothesis 
applies. We can find an equivalent F for Es in which the only -’s are in 
literals. F serves for Æ as well. Since the number of operators of F is at 
most twice the number in E minus 1, it is surely no more than twice the 
number of operators in E minus 1. 


2. Ey = Ey V Ez. By DeMorgan’s law, E = ~(E V E3) is equivalent 
to (~(E2)) A (A(E3)). Both (£2) and 7(£3) have fewer operators 
than E, so by the inductive hypothesis they have equivalents Fə and F3 
that have -’s only in literals. Then F = (F2) A (F3) serves as such an 
equivalent for Æ. We also claim that the number of operators in F is not 
too great. Let Ey and E3 have a and b operators respectively. Then E has 
a+b+2 operators. Since =(£2) and =(£3) have a+1 and b+ 1 operators, 
respectively, and F> and F3 are constructed from these expressions, by the 
inductive hypothesis we know that F» and F3 have at most 2(a+1)—1 
and 2(b+ 1) — 1 operators, respectively. Thus, F has 2a+2b+3 operators 
at most. This number is exactly twice the number of operators of E, 
minus 1. 


3. EL, = Ey A E3. This argument, using the second of DeMorgan’s laws, is 
essentially the same as (2). 
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Descriptions of Algorithms 


While formally, the running time of a reduction is the time it takes to 
execute on a single-tape Turing machine, these algorithms are needlessly 
complex. We know that the sets of problems that can be solved on con- 
ventional computers, on multitape TM’s and on single tape TM’s in some 
polynomial time are the same, although the degrees of the polynomials 
may differ. Thus, as we describe some fairly sophisticated algorithms that 
are needed to reduce one NP-complete problem to another, let us agree 
that times will be measured by efficient implementations on a conventional 
computer. That understanding will allow us to avoid details regarding ma- 
nipulation of tapes and will let us emphasize the important algorithmic 
ideas. 


10.3.3 NP-Completeness of CSAT 


Now, we need to take an expression E that is the AND and OR of literals and 
convert it to CNF. As we mentioned, in order to produce in polynomial time 
an expression F from E that is satisfiable if and only if E is satisfiable, we 
must forgo an equivalence-preserving transformation, and introduce some new 
variables for F that do not appear in Æ. We shall introduce this “trick” in the 
proof of the theorem that CSAT is NP-complete, and then give an example of 
the trick to make the construction clearer. 


Theorem 10.13: CSAT is NP-complete. 


PROOF: We show how to reduce SAT to CSAT in polynomial time. First, 
use the method of Theorem 10.12 to convert a given instance of SAT to an 
expression E whose -’s are only in literals. We then show how to convert E to 
a CNF expression F in polynomial time and show that F is satisfiable if and 
only if E is. The construction of F is by an induction on the length of Æ. The 
particular property that F has is somewhat more than we need. Precisely, we 
show by induction on the number of symbol occurrences (“length”) E that: 


e There is a constant c such that if E is a boolean expression of length n 
with -’s appearing only in literals, then there is an expression F such 
that: 

a) F is in CNF, and consists of at most n clauses. 

b) F is constructible from E in time at most c|E|?. 

c) A truth assignment T for E makes E true if and only if there exists 
an extension S of T that makes F true. 


BASIS: If E consists of one or two symbols, then it is a literal. A literal is a 
clause, so E is already in CNF. 
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INDUCTION: Assume that every expression shorter than E can be converted 
to a product of clauses, and that this conversion takes at most cn? time on an 
expression of length n. There are two cases, depending on the top-level operator 
of E. 


Case 1: E = E A Ey. By the inductive hypothesis, there are expressions 
F, and F derived from F, and E», respectively, in CNF. All and only the 
satisfying assignments for E; can be extended to a satisfying assignment for 
Fı, and similarly for E2 and Fy. Without loss of generality, we may assume 
that the variables of Fı and Fə are disjoint, except for those variables that 
appear in E; i.e., if we have to introduce variables into F, and/or F>, use 
distinct variables. 

Let F = F, A Fy. Evidently Fı A F> is a CNF expression if F; and Fə are. 
We must show that a truth assignment T for E can be extended to a satisfying 
assignment for F if and only if T satisfies F. 


(If) Suppose T satisfies E. Let Tı be T restricted so it applies only to the 
variables that appear in E,, and let Tə be the same for Hy. Then by the 
inductive hypothesis, Tı and Tə can be extended to assignments Sı and Sə that 
satisfy Fı and Fə, respectively. Let S agree with Sı and Sə on each of the 
variables they define. Note that, since the only variables F, and F> have in 
common are the variables of E, and Sı and S2 must agree on those variables if 
both are defined, it is always possible to construct S. But S is then an extension 
of T that satisfies F. 


(Only-if) Conversely, suppose that T has an extension S that satisfies F. Let 
Tı (resp., T2) be T restricted to the variables of E, (resp., E2). Let S restricted 
to the variables of F, (resp., F2) be Sı (resp., S2). Then Sı is an extension 
of Tı, and Sə is an extension of T>. Because F is the AND of F} and Fh, it 
must be that Sı satisfies F,, and Sə satisfies Fə. By the inductive hypothesis, 
Tı (resp., T2) must satisfy E, (resp., E2). Thus, T satisfies E. 


Case 2: E = FE, V Ey. As in case 1, we invoke the inductive hypothesis to 
assert that there are CNF expressions Fı and Fə with the properties: 


1. A truth assignment for E, (resp., E2) satisfies Æ, (resp., E2), if and only 
if it can be extended to a satisfying assignment for F, (resp., F2). 


2. The variables of Fı and Fə are disjoint, except for those variables that 
appear in E. 


3. F, and F>» are in CNF. 


We cannot simply take the OR of Fı and Fs to construct the desired F, 
because the resulting expression would not be in CNF. However, a more com- 
plicated construction, which takes advantage of the fact that we only want to 
preserve satisfiability, rather than equivalence, will work. Suppose 


Fi =g Ag2 N^ Agp 
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and Fp = hy A hy A++- A hq, where the g’s and h’s are clauses. Introduce a 
new variable y, and let 


F=(yt gq) AY tg) AA Ug) AG +h) AG + ha) Ao AGF ha) 


We must prove that a truth assignment T for E satisfies E if and only if T can 
be extended to a truth assignment S that satisfies F. 


(Only-if) Assume T satisfies Æ. As in Case 1, let T; (resp., T2) be T restricted 
to the variables of Fy (resp., E2). Since E = E; V Eo, either T satisfies E; or 
T satisfies Ey. Let us assume T satisfies E,. Then Tı, which is T restricted 
to the variables of Æ, can be extended to S1, which satisfies F,. Construct an 
extension S for T, as follows; S will satisfy the expression F defined above: 


1. For all variables x in Fi, S(x) = Sı (x). 


2. S(y) =0. This choice makes all the clauses of F that are derived from F> 
true. 


3. For all variables x that are in F> but not in F,, S(x) is T(x) if the latter 
is defined, and otherwise may be 0 or 1, abribtrarily. 


Then S makes all the clauses derived from the g’s true because of rule 1. S 
makes all the clauses derived from the h’s true by rule 2 — the truth assignment 
for y. Thus, S satisfies F. 

If T does not satisfy E1, but satisfies E», then the argument is the same, 
except S(y) = 1 in rule 2. Also, S(x) must agree with S(x) whenever Sə(x) is 
defined, but S(x) for variables appearing only in Sı is arbitrary. We conclude 
that S satisfies F in this case also. 


(If) Suppose that truth assignment T for E is extended to truth assignment S 
for F, and S satisfies F. There are two cases, depending on what truth-value 
is assigned to y. First suppose that S(y) = 0. Then all the clauses of F derived 
from the h’s are true. However, y is no help for the clauses of the form (y + g;) 
that are derived from the g’s, which means that S must make true each of the 
gi’s themselves; in essence, S makes F; true. 

More precisely, let Sı be S restricted to the variables of F,. Then Sj satisfies 
F,. By the inductive hypothesis, Tı, which is T restricted to the variables of 
Eı, must satisfy E,. The reason is that Sı is an extension of Tı. Since Tı 
satisfies £,, T must satisfy Æ, which is FE, V Fp. 

We must also consider the case that S(y) = 1, but this case is symmetric 
to what we have just seen, and we leave it to the reader. We conclude that T 
satisfies Æ whenever S satisfies F. 


Now, we must show that the time to construct F from F is at most quadratic, 
in n, the length of E. Regardless of which case applies, the splitting apart of E 
into FE, and FE», and construction of F from F and F> each take time that is 
linear in the size of E. Let dn be an upper bound on the time to construct Æ 
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and Fə from E plus the time to construct F from F; and F>, in either case 1 
or case 2. Then there is a recurrence equation for T(n), the time to construct 
F from any E of length n; its form is: 


T(1) = T(2) < e for some constant e 
T(n) < dn + cmaxocicn—1(T(i) + T(n —1-—i)) forn > 3 


where cis a constant as yet to be determined, such that we can show T(n) < cn?. 
The basis rule for T(1) and T(2) simply says that if E is a single symbol or 
a pair of symbols, then we need no recursion because E can only be a single 
literal, and the entire process takes some amount of time e. The recursive rule 
uses the fact that if E is composed of subexpressions EF; and Es connected 
by an operator A or V, and E; is of length i, then E> is of length n — i — 1. 
Moreover, the entire conversion of E to F consists of the two simple steps — 
changing E to Eı and FE, and changing Fı and F> to F — that we know take 
time at most dn, plus the two recursive conversions of E to F, and E> to F>. 

We need to show by induction on n that there is a constant c such that for 
all n, T(n) < en’. 


BASIS: For n = 1, we just need to pick c at least as large as e. 


INDUCTION: Assume the statement for lengths less than n. Then T (i) < ci? 
and T(n — i — 1) < c(n — i — 1)?. Thus, 


T(i)+T(n—i-—1) <n? —2i(n—i)—-2(n—i)+1 (10.1) 


Since n > 3, and 0 < i < n — 1, 2i(n — i) is at least n, and 2(n — i) is at least 2. 
Thus, the right side of (10.1) is less than n? — n, for any i in the allowed range. 
The recursive rule in the definition of T(n) thus says T(n) < dn + cn? — en. If 
we pick c > d, we may infer that T(n) < cn? holds for n, which concludes the 
induction. Thus, the construction of F from E takes time O(n’). 


Example 10.14: Let us show how the construction of Theorem 10.13 applies 
to a simple expression: E = x7 + z(y + z). Figure 10.7 shows the parse of this 
expression. Attached to each node is the CNF expression constructed for the 
expression represented by that node. 

The leaves correspond to the literals, and for each literal, the CNF expres- 
sion is one clause consisting of that literal alone. For instance, we see that 
the leaf labeled 7 has an associated CNF expression (y). The parentheses are 
unnecessary, but we put them in CNF expressions to help remind you that we 
are talking about a product of clauses. 

For an AND node, the construction of a CNF expression is simply to take the 
product (AND) of all the clauses for the two subexpressions. Thus, for instance, 
the node for the subexpression z(y + z) has an associated CNF expression that 
is the product of the one clause for %, namely (z), and the two clauses for y +z, 
namely (v + y)(0 + 2z).4 


4Tn this special case, where the subexpression y + z is already a clause, we did not have to 
perform the general construction for the OR of expressions, and could have produced (y+ z) 
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Figure 10.7: Transforming a boolean expression into CNF 


For an OR node, we must introduce a new variable. We add it to all the 
clauses for the left operand, and we add its negation to the clauses for the right 
operand. For instance, consider the root node in Fig. 10.7. It is the OR of 
expressions xy and z(y + z), whose CNF expressions have been determined to 
be (x)(y) and (%)(v + y)(U + z), respectively. We introduce a new variable u, 
which is added without negation to the first group of clauses and negated in 
the second group. The result is 


F=(u+e\ut+PG@t+F(Ut+ut+y\(T+0+2z) 


Theorem 10.13 tells us that any truth assignment T that satisfies E can be 
extended to a truth assignment S that satisfies F. For instance, the assignment 
T(x) = 0, T(y) = 1, and T(z) = 1 satisfies E. We can extend T to S by adding 
S(u) = 1 and S(v) = 0 to the required S(x) = 0, S(y) = 1, and S(z) = 1 that 
we get from T. You may check that S satisfies F 

Notice that in choosing S, we were required to pick S(u) = 1, because T 
makes only the second part of E, that is (y+ z), true. Thus, we need S(u) = 1 
to make true the clauses (u + 2)(u+ 4), which come from the first part of E. 
However, we could pick either value for v, because in the subexpression y + z, 
both sides of the OR are true according to T. 


10.3.4 NP-Completeness of 3SAT 


Now, we show an even smaller class of boolean expressions with an NP-complete 
satisfiability problem. Recall the problem 3SAT is: 


e Given a boolean expression F that is the product of clauses, each of which 
is the sum of three distinct literals, is Æ satisfiable? 


as the product of clauses equivalent to y+ z. However, in this example, we stick to the general 
rules. 
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Although the 3-CNF expressions are a small fraction of the CNF expressions, 
they are complex enough to make their satisfiability test NP-complete, as the 
next theorem shows. 


Theorem 10.15: 3SAT is NP-complete. 


PROOF: Evidently 3SAT is in MP, since SAT is in MP. To prove NP- 
completeness, we shall reduce CSAT to 3SAT. The reduction is as follows. 
Given a CNF expression E = e; A e2 A ++- A eg, we replace each clause e; as 
follows, to create a new expression F. The time taken to construct F is linear 
in the length of E, and we shall see that a truth assignment satisfies E if and 
only if it can be extended to a satisfying truth assignment for F. 


1. If e; is a single literal, say (x),° introduce two new variables u and v. 
Replace (a) by the four clauses (x +u+v)(at+utd)(a+U+v)(a+Ut+D). 
Since u and v appear in all combinations, the only way to satisfy all four 
clauses is to make x true. Thus, all and only the satisfying assignments 
for E can be extended to a satisfying assignment for F. 


2. Suppose e; is the sum of two literals, (x +y). Introduce a new variable z, 
and replace e; by the product of two clauses (x +y+2)(a@+y+2Z). Asin 
case 1, the only way to satisfy both clauses is to satisfy (a + y). 


3. If e; is the sum of three literals, it is already in the form required for 
3-CNF, so we leave e; in the expression F being constructed. 


4. Suppose e; = (£1 +£2 +: -+£m) for some m > 4. Introduce new variables 
Y1,Y2>---,Ym—3 and replace e; by the product of clauses 


(xı + z2 + y1)(£3 +71 + yo) (ta + Ho + y3) 


MESES E 10.2 
(Em 2+ Ym—4 + Ym 6) ita 1G +Ym—3) ( ) 


An assignment T that satisfies Æ must make at least one literal of e; true; 
say it makes x; true (recall 2; could be a variable or a negated variable). 
Then, if we make y,y2,...,yj—2 true and make yj—1, Yj, ---,Ym-3 false, 
we satisfy all the clauses of (10.2). Thus, T may be extended to satisfy 
these clauses. Conversely, if T makes all the x’s false, it is not possible to 
extend T to make (10.2) true. The reason is that there are m — 2 clauses, 
and each of the m — 3 y’s can only make one clause true, regardless of 
whether it is true or false. 


We have thus shown how to reduce each instance E of CSAT to an instance 
F of 3SAT, such that F is satisfiable if and only if E is satisfiable. The con- 
struction evidently requires time that is linear in the length of E, because none 
of the four cases above expands a clause by more than a factor 32/3 (that is the 


5For convenience, we shall talk of literals as if they were unnegated variables, like x. 
However, the constructions apply equally well if some or all of the literals are negated, like T. 
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ratio of symbol counts in case 1), and it is easy to calculate the needed sym- 
bols of F in time proportional to the number of those symbols. Since CSAT is 
NP-complete, it follows that 3-SAT is likewise NP-complete. 


10.3.5 Exercises for Section 10.3 
Exercise 10.3.1: Put the following boolean expressions into 3-CNF: 
* a) cy +z. 

b) wayz +u +v. 

c) wry + Tw. 


Exercise 10.3.2: The problem 4TA-SAT is defined as follows: Given a bool- 
ean expression E£, does E have at least four satisfying truth assignments. Show 
that 4TA-SAT is NP-complete. 


Exercise 10.3.3: In this exercise, we shall define a family of 3-CNF expres- 
sions. The expression Ep has n variables, £1, £2,...,£n. For each set of three 
distinct integers between 1 and n, say i, j, and k, En has clauses (x; + £j +2) 
and (% + z7 + zp). Is En satisfiable for: 


*la) n=4? 
a ee 


Exercise 10.3.4: Give a polynomial-time algorithm to solve the problem 
2SAT, i.e., satisfiability for CNF boolean expressions with only two literals 
per clause. Hint: If one of two literals in a clause is false, the other is forced to 
be true. Start with an assumption about the truth of one variable, and chase 
down all the consequences for other variables. 


10.4 Additional NP-Complete Problems 


We shall now give you a small sample of the process whereby one NP-complete 
problem leads to proofs that other problems are also NP-complete. This process 
of discovering new NP-complete problems has two important effects: 


e When we discover a problem to be NP-complete, it tells us that there 
is little chance an efficient algorithm can be developed to solve it. We 
are encouraged to look for heuristics, partial solutions, approximations, 
or other ways to avoid attacking the problem head-on. Moreover, we can 
do so with confidence that we are not just “missing the trick.” 


e Each time we add a new NP-complete problem P to the list, we re-enforce 
the idea that all NP-complete problems require exponential time. The 
effort that has undoubtedly gone into finding a polynomial-time algorithm 
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for problem P was, unknowingly, effort devoted to showing P = NP. It 
is the accumulated weight of the unsuccessful attempts by many skilled 
scientists and mathematicians to show something that is tantamount to 
P = NP that ultimately convinces us that it is very unlikely that P = 
NP, but rather that all the NP-complete problems require exponential 
time. 


In this section, we meet several NP-complete problems involving graphs. 
These problems are among those graph problems most commonly used in the 
solution to questions of practical importance. We shall talk about the Traveling 
Salesman problem (TSP), which we met earlier in Section 10.1.4. We shall show 
that a simpler, and also important version, called the Hamilton-Circuit problem 
(HC), is NP-complete, thus showing that the more general TSP is NP-complete. 
We introduce several other problems involving “covering,” of graphs, such as 
the “node-cover problem,” which asks us to find the smallest set of nodes that 
“cover” all the edges, in the sense that at least one end of every edge is in the 
selected set. 


10.4.1 Describing NP-complete Problems 


As we introduce new NP-complete problems, we shall use a stylized form of 
definition, as follows: 


1. The name of the problem, and usually an abbreviation, like 3SAT or TSP. 
2. The input to the problem: what is represented, and how. 


3. The output desired: under what circumstances should the output be 
66. nO 
yes” 


4. The problem from which a reduction is made to prove the problem NP- 
complete. 


Example 10.16: Here is how the description of the problem 3SAT and its 
proof of NP-completeness might look: 


PROBLEM: Satisfiability for 3-CNF expressions (3SAT). 
INPUT: A boolean expression in 3-CNF. 


OUTPUT: “Yes” if and only if the expression is satisfiable. 


REDUCTION FROM: CSAT. 


10.4.2 The Problem of Independent Sets 


Let G be an undirected graph. We say a subset J of the nodes of G is an inde- 
pendent set if no two nodes of I are connected by an edge of G. An independent 
set is maximal if it is as large (has as many nodes) as any independent set for 
the same graph. 
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Example 10.17: In the graph of Fig. 10.1 (see Section 10.1.2), {1,4} is a 
maximal independent set. It is the only set of size two that is independent, 
because there is an edge between any other pair of nodes. Thus, no set of size 
three or more is independent; for instance, {1, 2,4} is not independent because 
there is an edge between 1 and 2. Thus, {1,4} is a maximal independent set. In 
fact, it is the only maximal independent set for this graph, although in general 
a graph may have many maximal independent sets. As another example, {1} 
is an independent set for this graph, but not maximal. 


In combinatorial optimization, the maximal-independent-set problem is usu- 
ally stated as: given a graph, find a maximal independent set. However, as with 
all problems in the theory of intractable problems, we need to state our problem 
in yes/no terms. Thus, we need to introduce a lower bound into the statement 
of the problem, and we phrase the question as whether a given graph has an 
independent set at least as large as the bound. The formal definition of the 
maximal-independent-set problem is: 


PROBLEM: Independent Set (IS). 


INPUT: A graph G and a lower bound k, which must be between 1 and the 
number of nodes of G. 


OUTPUT: “Yes” if and only if G has an independent set of k nodes. 
REDUCTION FROM: 3SAT. 


We must prove IS to be NP-complete by a polynomial-time reduction from 
3SAT, as promised. That reduction is in the next theorem. 


Theorem 10.18: The independent-set problem is NP-complete. 


PROOF: First, it is easy to see that IS is in MP. Given a graph G and a bound 
k, guess k nodes and check that they are independent. 

Now, let us show how to perform the reduction of 3SAT to IS. Let E = 
(e1)(e2) +++ (em) be a 3-CNF expression. We construct from E a graph G with 
3m nodes, which we shall give the names [i,j], where 1 <i < m and j = 1, 2, 
or 3. The node [i,j] represents the jth literal in the clause e;. Figure 10.8 is 
an example of a graph G, based on the 3-CNF expression 


(zı +z + z3)(TI + £2 + z4) (T2 + £3 + Xs) (Hz + T1 + T5) 


The columns represent the clauses; we shall explain shortly why the edges are 
as they are. 

The “trick” behind the construction of G is to use edges to force any inde- 
pendent set with m nodes to represent a way to satisfy the expression Æ. There 
are two key ideas. 


1. We want to make sure that only one node corresponding to a given clause 
can be chosen. We do so by putting edges between all pairs of nodes 
in a column, i.e., we create the edges (fi, 1], [¢,2]), ({¢, 1], [,3]), and 
([i, 2], [¢,3]), for all 2, as in Fig. 10.8. 
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X3 X4 
[1,3] [2,3] 


Figure 10.8: Construction of an independent set from a satisfiable boolean 
expression in 3-CNF 


2. We must prevent nodes from being chosen for the independent set if they 
represent literals that are complementary. Thus, if there are two nodes 
[é1, j1] and [i2, j2], such that one of them represents a variable x, and the 
other represents Z, we place an edge between these two nodes. Thus, it is 
not possible to choose both of these nodes for an independent set. 


The bound & for the graph G constructed by these two rules is m. 

It is not hard to see how graph G and bound & can be constructed from 
expression E in time that is proportional to the square of the length of E, so 
the conversion of E to G is a polynomial-time reduction. We must show that 
it correctly reduces 3SAT to IS. That is: 


e F is satisfiable if and only if G has an independent set of size m. 


(If) First, observe that an independent set may not include two nodes from 
the same clause, [i, 31] and [i, j2] for some jı # jo. The reason is that there 
are edges between each pair of such nodes, as we observe from the columns in 
Fig. 10.8. Thus, if there is an independent set of size m, this set must include 
exactly one node from each clause. 

Moreover, the independent set may not include nodes that correspond to 
both a variable x and its negation %. The reason is that all pairs of such nodes 
also have an edge between them. Thus, the independent set I of size m yields 
a Satisfying truth assignment T for E as follows. If a node corresponding to a 
variable x is in J, then make T(x) = 1; if a node corresponding to a negated 
variable 7 is in I, then choose T(x) = 0. If there is no node in J that corresponds 
to either x or Z, then pick T(x) arbitrarily. Note that item (2) above explains 
why there cannot be a contradiction, with nodes corresponding to both x and 
Zin I. 
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Are Yes-No Problems Easier? 


We might worry that a yes/no version of a problem is easier than the 
optimization version. For instance, it might be hard to find a largest 
independent set, but given a small bound k, it might be easy to verify 
that there is an independent set of size k. While true, it is also the case 
that we might be given a constant k that is exactly largest size for which 
an independent set exists. If so, then solving the yes/no version requires 
us to find a maximal independent set. 

In fact, for all the common problems that are NP-complete, their 
yes/no versions and optimization versions are equivalent in complexity, at 
least to within a polynomial. Typically, as in the case of IS, if we had 
a polynomial-time algorithm to find maximal independent sets, then we 
could solve the yes/no problem by finding a maximal independent set, 
and seeing if it was at least as large as the limit k. Since we shall show 
the yes/no version is NP-complete, the optimization version must be in- 
tractable as well. 

The comparison can also be made the other way. Suppose we had a 
polynomial-time algorithm for the yes/no problem IS. If the graph has n 
nodes, the size of the maximal independent set is between 1 and n. By 
running IS with all bounds between 1 and n, we can surely find the size 
of a maximal independent set (although not necessarily the set itself) in 
n times the amount of time it takes to solve IS once. In fact, by using 
binary search, we need only a logy n factor in the running time. 


We claim that T satisfies Æ. The reason is that each clause of Æ has the node 
corresponding to one of its literals in J, and T is chosen so that literal is made 
true by T. Thus, when an independent set of size m exists, E is satisfiable. 


(Only-if) Now suppose E is satisfied by some truth assignment, say T. Since T 
makes each clause of E true, we can identify one literal from each clause that 
T makes true. For some clauses, we may have a choice of two or three of the 
literals, and if so, pick one of them arbitrarily. Construct a set of m nodes I by 
picking the node corresponding to the selected literal from each clause. 

We claim J is an independent set. The edges between nodes that come from 
the same clause (the columns in Fig. 10.8) cannot have both ends in I, because 
we pick only one node from each clause. An edge connecting a variable and its 
negation cannot have both ends in J, because we selected for J only nodes that 
correspond to literals made true by the truth assignment T. Of course T will 
make one of x and ¥ true, but never both. We conclude that if E is satisfiable, 
then G has an independent set of size m. 

Thus, there is a polynomial time reduction from 3SAT to IS. Since 3SAT is 
known to be NP-complete, so is IS by Theorem 10.5. 
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What are Independent Sets Good For? 


It is not the purpose of this book to cover applications of the problems we 
prove NP-complete. However, the selection of problems in Section 10.4 was 
taken from a fundamental paper on NP-completeness by R. Karp, where 
he examined the most important problems from the field of Operations 
Research and showed a good number of them to be NP-complete. Thus, 
there is ample evidence available of “real” problems that are solved using 
these abstract problems. 

As an example, we could use a good algorithm for finding large inde- 
pendent sets to schedule final exams. Let the nodes of the graph be the 
classes, and place an edge between two nodes if one or more students are 
taking both those classes, and therefore their finals could not be scheduled 
for the same time. If we find a maximal independent set, then we can 
schedule all those classes for finals at the same time, sure that no student 
will have a conflict. 


Example 10.19: Let us see how the construction of Theorem 10.18 works for 
the case where 


E= (zı + x2 + £3)(TI + £2 + x4) (T2 + x3 + 5)(%3 + T1 + T5) 


We already saw the graph obtained from this expression in Fig. 10.8. The 
nodes are in four columns corresponding to the four clauses. We have shown 
for each node not only its name (a pair of integers), but the literal to which 
it corresponds. Notice how there are edges between each pair of nodes in a 
column, which corresponds to the literals of one clause. There are also edges 
between each pair of nodes that corresponds to a variable and its complement. 
For instance, the node [3,1], which corresponds to Fz, has edges to the two 
nodes, [1,2] and [2, 2], each of which corresponds to an occurrence of x. 

We have selected, by boldface outline, a set I of four nodes, one from each 
column. These evidently form an independent set. Since their four literals are 
£1, £2, £3, and %q, we can construct from them a truth assignment T that has 
T(a1) = 1, T(z2) = 1, T(x3) = 1, and T(a4) = 0. There must also be an 
assignment for x5, but we may pick that arbitrarily, say T (x5) = 0. Now T 
satisfies Æ, and the set of nodes J indicates a literal from each clause that is 
made true by T. 


10.4.3 The Node-Cover Problem 


Another important class of combinatorial optimization problems involves “cov- 
ering” of a graph. For instance, an edge covering is a set of edges such that 
every node in the graph is an end of at least one edge in the set. An edge 
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covering is minimal if it has as few edges as any edge covering for the same 
graph. It is possible to find a minimal edge covering in time that is polynomial 
in the size of the graph, although we shall not prove this fact here. 

We shall prove NP-complete the problem of node covering. A node cover 
of a graph is a set of nodes such that each edge has at least one of its ends at 
a node of the set. A node cover is minimal if it has as few nodes as any node 
cover for the given graph. 

Node covers and independent sets are closely related. In fact, the comple- 
ment of an independent set is a node cover, and vice-versa. Thus, if we state 
the yes/no version of the node-cover problem (NC) properly, a reduction from 
IS is very simple. 


PROBLEM: The Node-Cover Problem (NC). 


INPUT: A graph G and an upper limit k, which must be between 0 and one 
less than the number of nodes of G. 


OUTPUT: “Yes” if and only if G has a node cover with k or fewer nodes. 


REDUCTION FROM: Independent Set. 


Theorem 10.20: The node-cover problem is NP-complete. 


PROOF: Evidently, NC is in VP. Guess a set of k nodes, and check that each 
edge of G has at least one end in the set. 

To complete the proof, we shall reduce IS to NC. The idea, which is suggested 
by Fig. 10.8, is that the complement of an independent set is a node cover. For 
instance, the set of nodes that do not have boldface outlines in Fig. 10.8 form 
a node cover. Since the boldface nodes are in fact a maximal independent set, 
the other nodes form a minimal node cover. 

The reduction is as follows. Let G with lower limit k be an instance of the 
independent-set problem. If G has n nodes, let G with upper limit n — k be the 
instance of the node-cover problem we construct. Evidently this transformation 
can be accomplished in linear time. We claim that 


e G has an independent set of size k if and only if G has a node cover of 
size n — k. 


(If) Let N be the set of nodes of G, and let C be the node cover of size n — k. 
We claim that N — C is an independent set. Suppose not; that is, there is a 
pair of nodes v and w in N — C that has an edge between them in G. Then 
since neither v nor w is in C, the edge (v, w) in G is not covered by the alleged 
node cover C. We have proved by contradiction that N — C is an independent 
set. Evidently, this set has k nodes, so this direction of the proof is complete. 


(Only-if) Suppose I is an independent set of k nodes. We claim that N — I is 
a node cover with n — k nodes. Again, we proceed by contradiction. If there 
is some edge (v, w) not covered by N — I, then both v and w are in J, yet are 
connected by an edge, which contradicts the definition of an independent set. 
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10.4.4 The Directed Hamilton-Circuit Problem 


We would like to show NP-complete the Traveling Salesman Problem (TSP), 
because this problem is one of great interest in combinatorics. The best known 
proof of its NP-completeness is actually a proof that a simpler problem, called 
the “Hamilton-Circuit Problem” (HC) is NP-complete. The Hamilton-Circuit 
Problem can be described as follows: 


PROBLEM: Hamilton-Circuit Problem. 
INPUT: An undirected graph G. 


OUTPUT: “Yes” if and only if G has a Hamilton circuit, that is, a cycle that 
passes through each node of G exactly once. 


Notice that the HC problem is a special case of the TSP, in which all the weights 
on the edges are 1. Thus, a polynomial-time reduction of HC to TSP is very 
simple: just add a weight of 1 to the specification of each edge in the graph. 

The proof of NP-completeness for HC is very hard. Our approach is to 
introduce a more constrained version of HC, in which the edges have directions 
(i.e., they are directed edges, or arcs), and the Hamilton circuit is required to 
follow arcs in the proper direction. We reduce 3SAT to this directed version of 
the HC problem, then reduce it to the standard, or undirected, version of HC. 
Formally: 


PROBLEM: The Directed Hamilton-Circuit Problem (DHC). 
INPUT: A directed Graph G. 


OUTPUT: “Yes” if and only if there is a directed cycle in G that passes through 
each node exactly once. 


REDUCTION FROM: 3SAT. 


Theorem 10.21: The Directed Hamilton-Circuit Problem is NP-complete. 


PROOF: The proof that DHC is in MP is easy; guess a cycle and check that all 
the arcs it needs are present in the graph. We must reduce 3SAT to DHC, and 
this reduction requires the construction of a complicated graph, with “gadgets,” 
or specialized subgraphs, representing each variable and each clause of the 35AT 
instance. 

To begin the construction of a DHC instance from a 3-CNF boolean expres- 
sion, let the expression be E = e} A e2 A ++ A ex, where each e; is a clause, 
the sum of three literals, say e; = (ai + aig + aig). Let £1, £2,..., £n be the 
variables of E. For each clause and for each variable, we construct a “gadget,” 
suggested in Fig. 10.9. 

For each variable x; we construct a subgraph H; with the structure shown 
in Fig. 10.9(a). Here, m; is the larger of the number of occurrences of x; and 
the number of occurrences of z; in Æ. In the two columns of nodes, the b’s and 
the œs, there are arcs between b;; and cj; in both directions. Also, each of the 
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(c) 


Figure 10.9: Constructions used in the proof that the Hamilton-circuit problem 
is NP-complete 
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b’s has an arc to the c below it; i.e., bj; has an arc to ci j+1, as long as j < mi. 
Likewise, cj; has an arc to bi j+1, for 7 < m,. Finally, there is a head node a; 
with arcs to both bj and cio, and a foot node d;, with arcs from bim, and Cim;. 

Figure 10.9(b) outlines the structure of the entire graph. Each hexagon 
represents one of the gadgets for a variable, with the structure of Fig. 10.9(a). 
The foot node of one gadget has an arc to the head node of the next gadget, in 
a cycle. 

Suppose we had a directed Hamilton circuit for the graph of Fig. 10.9(b). 
We may as well suppose the cycle starts at a,. If it next goes to big, we claim 
it must then go to c10, for if not, then cjg could never appear on the cycle. In 
proof, note that if the cycle goes from a; to bio to c11, then as both predecessors 
of Cio (that is, ao and bio) are already on the cycle, the cycle can never include 
C10. 

Thus, if the cycle begins a1, b10, then it must continue down the “ladder,” 
alternating between the sides, as 


a1, b10, C10, bit, Cils. -3 bim: »Clmi> dı 
If the cycle begins with a1, c10, then the ladder is descended in an order where 
the c at a level precedes the b as: 


a1, C10, b10, €11, 011, 24 £3 Cim1301m,, 1 


A crucial point in the proof is that we can treat the first order, where descent 
is from c’s to lower b’s as if the variable corresponding to the gadget is made 
true, while the order in which descent is from b’s to the lower c’s corresponds 
to making that variable false. 

After traversing the gadget Hı, the cycle must go to a2, where there is 
another choice: go to bag or c2ọ next. However, as we argued for Hı, once we 
make a choice of whether to go left or right from a2, the path through Hə is 
fixed. In general, when we enter each H; we have a choice of going left or right, 
but no other choices if we are not to render a node inaccessible (i.e., the node 
cannot appear on a directed Hamilton circuit, because all of its predecessors 
have appeared already). 

In what follows, it helps to think of making the choice of going from a; to bio 
as making variable x; true, while choosing to go from a; to Cio is tantamount 
to making x; false. Thus, the graph of Fig. 10.9(b) has exactly 2” directed 
Hamilton circuits, corresponding to the 2” truth assignments to n variables. 

However, Fig. 10.9(b) is only the skeleton of the graph that we generate for 
3-CNF expression Æ. For each clause ej, we introduce another subgraph Ij, 
shown in Fig. 10.9(c). Gadget I; has the property that if a cycle enters at rj, 
it must leave at uj; if it enters at s; it must leave at vj, and if it enters at 
tj it must leave at w;. The argument we shall offer is that if the cycle, once 
it reaches Ij, does anything but leave by the node below the one in which it 
entered, then one or more nodes are inaccessible — they can never appear on 
the cycle. By symmetry, we can consider only the case where r; is the first 
node of J; on the cycle. There are three cases: 
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1. The next two vertices on the cycle are s; and t;. If the cycle then goes 
to wj and leaves, vj is inaccessible. If the cycle goes to wj and vj and 
then leaves, uj is inaccessible. Thus, the cycle must leave at uj, having 
traversed all six nodes of the gadget. 


2. The next two vertices after rj are s; and vj. If the cycle does not next 
go to uj, then uj becomes inaccessible. If after uj, the cycle next goes to 
wj, then t; can never appear on the cycle. The argument is the ‘reverse” 
of the inaccessibility argument. Now, t; can be reached from outside, but 
if the cycle later includes t;, there will be no next node possible, because 
both successors of t; appeared earlier on the cycle. Thus, in this case also, 
the cycle leaves by uj. Note, however, that t; and w; are left untraversed; 
they will have to appear later on the cycle, which is possible. 


3. The circuit goes from r; directly to u;. If the cycle then goes to wj, then 
tj cannot appear on the cycle because its successors have both appeared 
previously, as we argued in case (2). Thus, in this case, the cycle must 
leave directly by uj, leaving the other four nodes to be added to the cycle 
later. 


To complete the construction of the graph G for expression E, we connect 
the J;’s to the H,’s as follows: Suppose the first literal in clause ej is x;, an 
unnegated variable. Pick some node Cip, for p in the range 0 to m; — 1, that 
has not yet been used for the purpose of connecting to one of the J gadgets. 
Introduce arcs from Cip to rj and from uj to bip+1. If the first literal of clause e; 
is Z;, a negated literal, then find an unused bip. Connect bip to r; and connect 
Uj to Ci,pt1- 

For the second and third literals of e;, we make the same additions to the 
graph, with one exception. For the second literal, we use nodes s; and vj, and 
for the third literal we use nodes t; and w;. Thus, each J; has three connections 
to the H gadgets that represent the variables involved in the clause ej. The 
connection comes from a c-node and returns to the b-node below if the literal 
is unnegated, and it comes from a b-node, returning to the c-node below, if the 
literal is negated. We claim that: 


e The graph G so constructed has a directed Hamilton circuit if and only 
if the expression F is satisfiable. 


(If) Suppose there is a satisfying truth assignment T for E. Construct a directed 
Hamilton circuit as follows. 


1. Begin with the path that traverses only the H’s [i.e., the graph of Fig. 
10.9(b)] according to the truth assignment T. That is, the cycle goes from 
a; to bio if T(x) = 1, and it goes from a; tO Cio if T(z) = 0. 


2. However, if the cycle constructed so far follows an arc from bip to Ci p+1, 
and bip has another arc to one of the J;’s that has not yet been included 
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in the cycle, introduce a “detour” in the cycle that includes all six nodes 
of J; on the cycle, returning to cj,,41. The arc bip > Ci,p+1 will no longer 
be on the cycle, but the nodes at its ends remain on the cycle. 


3. Likewise, if the cycle has an arc from Cip to bi p+1, and Cip has another arc 
out that goes to an J; that has not yet been incorporated into the cycle, 
modify the cycle to “detour” through all six nodes of J;. 


The fact that T satisfies Æ assures us that the original path constructed by 
step (1) will include at least one arc that, in step (2) or (3), allows us to include 
the gadget I; for each clause ej. Thus, all the J,’s get included in the cycle, 
which becomes a directed Hamilton circuit. 


(Only-if) Now, suppose that the graph G has a directed Hamilton circuit. We 
must show that E is satisfiable. First, recall two important points from the 
analysis we have done so far: 


1. If a Hamilton circuit enters some J; at rj, sj, or tj, then it must leave at 
Uj, Uj, Or wj, respectively. 


2. Thus, if we view the Hamilton circuit as moving through the cycle of H 
gadgets, as in Fig. 10.9(b), the excursions that the path makes to some I; 
can be viewed as if the cycle followed an arc that was “in parallel” with 
one of the arcs bip > Cip+1 OF Cip + bi p+1- 


If we ignore the excursions to the J;’s, then the Hamilton circuit must be one 
of the 2” cycles that are possible using the H;’s only — those that make choices 
to move from each a; to either bio or cio. Each of these choices corresponds to a 
truth assignment for the variables of E. If one of these choices yields a Hamilton 
circuit including the J;’s, then this truth assignment must satisfy F. 

The reason is that if the cycle goes from a; to bio, then we can only make an 
excursion to J; if the jth clause has x; as one of its three literals. If the cycle 
goes from a; to Cio, then we can only make an excursion to J; if the jth clause 
has 7; as a literal. Thus, the fact that all J; gadgets can be included implies 
that the truth assignment makes at least one of the three literals of each clause 
true; i.e., E is satisfiable. 


Example 10.22: Let us give a very simple example of the construction of 
Theorem 10.21, based on the 3-CNF expression E = (x1 +£ +23) (T1 +T27+ 123). 
The constructed graph is shown in Fig. 10.10. Arcs that connect H-type gadgets 
to I-type gadgets are shown dotted, to improve readability, but there is no other 
distinction between dotted and solid arcs. 

For instance, at the top left, we see the gadget for xı. Since xı appears 
once negated and once unnegated, the “ladder” needs only one step, so there 
are two rows of b’s and c’s. At the bottom left, we see the gadget for z3, which 
appears twice unnegated and does not appear negated. Thus, we need two 
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Figure 10.10: Example of the Hamilton-circuit construction 
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different c3p —> b3, p+1 arcs that we can use to attach the gadgets for J, and Ig 
to represent uses of x3 in these clauses. That is why the gadget for x3 needs 
three b-c rows. 

Let us consider the gadget I2, which corresponds to the clause (47+ 72+ 23). 
For the first literal, 77, we attach big to rə and we attach us to c11. For the 
second literal, z2, we do the same with bo9, s2, v2, and c21. The third literal, 
being unnegated, is attached to a c and the b below; that is, we attach c31 to 
tə and wə to b32. 

One of several satisfying truth assignments is zı = 1; z2 = 0, and x3 = 0. 
For this assignment, the first clause is satisfied by its first literal zı, while the 
second clause is satisfied by the second literal, z2. For this truth assignment, 
we can devise a Hamilton circuit in which the arcs a, —> bio, a2 —> c20, and 
a3 — C39 are present. The cycle covers the first clause by detouring from Hı to 
I; i.e., it uses the arc cig > r1, traverses all the nodes of I4, and returns to b11- 
The second clause is covered by the detour from Hə to Is starting with the arc 
b20 — s2, traversing all of Jz, and returning to c21. The entire Hamilton cycle is 
shown with thicker lines (solid or dotted) and very large arrows, in Fig. 10.10. 


10.4.5 Undirected Hamilton Circuits and the TSP 


The proofs that the undirected Hamilton-circuit problem and the Traveling 
Salesman problem are also NP-complete are relatively easy. We already saw in 
Section 10.1.4 that TSP is in MP. HC is a special case of TSP, so it is also in 
NP. We must perform the reductions of DHC to HC and HC to TSP. 


PROBLEM: Undirected Hamilton-Circuit Problem. 
INPUT: An undirected graph G. 
OUTPUT: “Yes” if and only if G has a Hamilton circuit. 


REDUCTION FROM: DHC. 


Theorem 10.23: HC is NP-complete. 


PROOF: We reduce DHC to HC, as follows. Suppose we are given a directed 
graph Gg. The undirected graph we construct will be called Gu. For every 
node v of Gg, there are three nodes v® , v), and v® in Gu. The edges of Gu 
are: 


1. For all nodes v of Gq, there are edges (vv) and (uv ,v@)) in Gy. 
2. If there is an arc v + w in Gg, then there is an edge (v®), w) in Gy. 


Figure 10.11 suggests the pattern of edges, including the edge for an arc v > w. 
Clearly the construction of Gu from Gg can be performed in polynomial 
time. We must show that 
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Figure 10.11: Arcs in Gg are replaced by edges in G, that go from rank 2 to 
rank 0 


e G, has a Hamilton circuit if and only if Gg has a directed Hamilton 


circuit. 
(If) Suppose v1, v2,.-.,Un, v1 is a directed Hamilton circuit. Then surely 
vy) ; v ; v” ; vk ; vs) : vP ; v, a 0 vv), v) 


is an undirected Hamilton circuit in Gu. That is, we go down each column, and 
then jump to the top of the next column to follow an arc of Ga. 


(Only-if) Observe that each node v! of Gu has only two edges, and therefore 
must appear in a Hamilton circuit with one of v© and v its immediate 
predecessor, and the other its immediate successor. Thus, a Hamilton circuit in 
Gu must have superscripts on its nodes that vary in the pattern 0,1,2,0,1,2,... 
or its opposite, 2,1,0,2,1,0,.... Since these patterns correspond to traversing 
a cycle in the two different directions, we may as well assume the pattern is 
0,1,2,0,1,2,.... Thus, if we look at the edges of the cycle that go from a node 
with superscript 2 to one with superscript 0, we know that these edges are arcs 
of Gg, and that each is followed in the direction in which the arc points. Thus, 
an undirected Hamilton circuit in G, yields a directed Hamilton circuit in Gg. 


PROBLEM: Traveling Salesman Problem. 


INPUT: An undirected graph G with integer weights on the edges, and a limit 
k. 


OUTPUT: “Yes” if and only if there is a Hamilton circuit of G, such that the 
sum of the weights on the edges of the cycle is less than or equal to k. 


REDUCTION FROM: HC. 


Theorem 10.24: The Traveling Salesman Problem is NP-complete. 
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PROOF: The reduction from HC is as follows. Given a graph G, construct a 
weighted graph G’ whose nodes and edges are the same as the nodes and edges 
of G, with a weight of 1 on each edge, and a limit k that is equal to the number 
of nodes n of G. Then a Hamilton circuit of weight n exists in G” if and only 
if there is a Hamilton circuit in G. 


All of NP 


Figure 10.12: Reductions among NP-complete problems 


10.4.6 Summary of NP-Complete Problems 


Figure 10.12 indicates all the reductions we have made in this chapter. Notice 
that we have suggested reductions from all the specific problems, like TSP, to 
SAT. What happened was that we reduced the language of every polynomial- 
time, nondeterministic Turing machine to SAT in Theorem 10.9. Without men- 
tioning it explicitly, these TM’s included at least one that solves TSP, one that 
solves IS, and so on. Thus, all the NP-complete problems are polynomial-time 
reducible to one another, and are, in effect, different faces of the same problem. 


10.4.7 Exercises for Section 10.4 


Exercise 10.4.1: A k-clique in a graph G is a set of k nodes of G such that 
there is an edge between every two nodes in the clique. Thus, a 2-clique is just 
a pair of nodes connected by an edge, and a 3-clique is a triangle. The problem 
CLIQUE is: given a graph G and a constant k, does G have a k-clique? 
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a) What is the largest k for which the graph G of Fig. 10.1 satisfies CLIQUE? 
b) How many edges does a k-clique have, as a function of k? 


c) Prove that CLIQUE is NP-complete by reducing the node-cover problem 
to CLIQUE. 


*! Exercise 10.4.2: The coloring problem is: given a graph G and an integer k, 
is G “k-colorable”; that is, can we assign one of k colors to each node of G in 
such a way that no edge has both of its ends colored with the same color. For 
example, the graph of Fig. 10.1 is 3-colorable, since we can assign nodes 1 and 
4 the color red, 2 green, and 3 blue. In general, if a graph has a k-clique, then 
it can be no less than k-colorable, although it might require many more than k 
colors. 


Figure 10.13: Part of the construction showing the coloring problem to be NP- 
complete 


In this exercise, we shall give part of a construction to show that the coloring 
problem is NP-complete; you must fill in the rest. The reduction is from 3SAT. 
Suppose that we have a 3-CNF expression with n variables. The reduction 
converts this expression into a graph, part of which is shown in Fig. 10.13. 
There are, as seen on the left, n + 1 nodes co,c1,..-,¢n that form an (n + 1)- 
clique. Thus, each of these nodes must be colored with a different color. We 
should think of the color assigned to cj as “the color cj.” 

Also, for each variable x;, there are two nodes, which we may think of as z; 
and %;. These two are connected by an edge, so they cannot get the same color. 
Moreover, each of the nodes for x; is connected to cj for all j other than 0 and 
i. As a result, one of x; and zg must be colored co, and the other is colored ci. 
Think of the one colored co as true and the other as false. Thus, the coloring 
chosen corresponds to a truth assignment. 

To complete the construction, you need to design a portion of the graph for 
each clause of the expression. It should be possible to complete the coloring 
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of the graph using only the colors co through c, if and only if each clause is 
made true by the truth assignment corresponding to the choice of colors. Thus, 
the constructed graph is (n + 1)-colorable if and only if the given expression is 
satisfiable. 


om Pal Oh 
© 
© 


Figure 10.14: A graph 


! Exercise 10.4.3: A graph does not have to be too large before NP-complete 
questions about it become very hard to solve by hand. Consider the graph of 
Fig. 10.14. 


Fa 


b 


Does this graph have a Hamilton circuit? 


What is the largest independent set? 


) 
) 

c) What is the smallest node cover? 

d) What is the smallest edge cover (see Exercise 10.4.4(c))? 
) 


e) Is the graph 2-colorable? 


Exercise 10.4.4: Show the following problems to be NP-complete: 


a) The subgraph-isomorphism problem: given graphs G; and G2, does Gi 
contain a copy of Gz as a subgraph? That is, can we find a subset of the 
nodes of G; that, together with the edges among them in G1, forms an 
exact copy of Gz when we choose the correspondence between nodes of G2 
and nodes of the subgraph of Gi properly? Hint: Consider a reduction 
from the clique problem of Exercise 10.4.1. 
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The feedback arc problem: given a graph G and an integer k, does G have 
a set of k arcs such that every directed cycle of G contains at least one of 
the k arcs? 


The linear integer programming problem: given a set of linear constraints 
of the form $7}, aizi < cor Ð; Qizi > c, where the a’s and c are integer 
constants and z1, £2,..., £n are variables, does there exist an assignment 
of integers to each of the variables that makes all the constraints true? 


The dominating-set problem: given a graph G and an integer k, does 
there exist a subset S of k nodes of G such that each node is either in S 
or adjacent to a node of S? 


The firehouse problem: given a graph G, a distance d, and a budget f of 
“firehouses,” is it possible to choose f nodes of G such that no node is of 
distance (number of edges that must be traversed) greater than d from 
some firehouse? 


The half-clique problem: Given a graph G with an even number of vertices, 
does there exist a clique of G (see Exercise 10.4.1) consisting of exactly 
half the nodes of G? Hint: Reduce CLIQUE to the half-clique problem. 
You must figure out how to add nodes to adjust the size of the largest 
clique. 


The unit-execution-time-scheduling problem: given k “tasks” 
T),T2,. i ,Tk 


a number of “processors” p, a “time limit” t, and some “precedence con- 
straints” of the form T; < Tj between pairs of tasks, does there exist a 
schedule of the tasks, such that: 


1. Each task is assigned to one time unit between 1 and t, 
2. At most p tasks are assigned to any one time unit, and 


3. The precedence constraints are respected; that is, if T; < T; is a 
constraint, then T; is assigned to an earlier time unit than T}? 


The exact-cover problem: given a set S and a set of subsets S1, S2,..., Sn 
of S, is there a set of sets T C {.51, S2,...,5,} such that each element x 
of S is in exactly one member of T? 


The knapsack problem: given a list of k integers 71,72,...,%%, can we 
partition them into two sets whose sums are the same? Note: This prob- 
lem appears superficially to be in P, since you might assume that the 
integers themselves are small. Indeed, if the values of the integers are 
limited to some polynomial in the number of integers k, then there is a 
polynomial-time algorithm. However, in a list of k integers represented in 
binary, having total length n, we can have certain integers whose values 
are almost exponential in n. 
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Exercise 10.4.5: A Hamilton path in a graph G is an ordering of all the nodes 
N1, N2, .. -Npk such that there is an edge from n; to ni+1, for alli = 1,2,...,k—1. 
A directed Hamilton path is the same for a directed graph; there must be an arc 
from each n; to ni+1. Notice that the Hamilton path requirement is just slightly 
weaker than the Hamilton-circuit condition. If we also required an edge or arc 
from nz to nı, then it would be exactly the Hamilton-circuit condition. The 
(directed) Hamilton-path problem is: given a (directed) graph, does it have at 
least one (directed) Hamilton path? 


* a) Prove that the directed Hamilton-path problem is NP-complete. Hint: 
Perform a reduction from DHC. Pick any node, and split it into two, such 
that these two nodes must be the endpoints of a directed Hamilton path, 
and such a path exists if and only if the original graph has a directed 
Hamilton circuit. 


io” 
wna 


Show that the (undirected) Hamilton-path problem is NP-complete. Hint: 
Adapt the construction of Theorem 10.23. 


*! c 


wa 


Show that the following problem is NP-complete: given a graph G and 
an integer k, does G have a spanning tree with at most k leaf vertices? 
Hint: Perform a reduction from the Hamilton-path problem. 


on 
Nas, 


Show that the following problem is NP-complete: given a graph G and 
an integer d, does G have a spanning tree with no node of degree greater 
than d? (The degree of a node n in the spanning tree is the number of 
edges of the tree that have n as an end.) 


10.5 Summary of Chapter 10 


+ The Classes P and NP: P consists of all those languages or problems 
accepted by some Turing machine that runs in some polynomial amount 
of time, as a function of its input length. MP is the class of languages or 
problems that are accepted by nondeterministic TM’s with a polynomial 
bound on the time taken along any sequence of nondeterministic choices. 


+ The P = NP Question: It is unknown whether or not P and NP are 
really the same classes of languages, although we suspect strongly that 
there are languages in NP that are not in P. 


+ Polynomial-Time Reductions: If we can transform instances of one prob- 
lem in polynomial time into instances of a second problem that has the 
same answer — yes or no — then we say the first problem is polynomial- 
time reducible to the second. 


+ NP-Complete Problems: A language is NP-complete if it is in NP, and 
there is a polynomial-time reduction from each language in MP to the 
language in question. We believe strongly that none of the NP-complete 
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problems are in P, and the fact that no one has ever found a polynomial- 
time algorithm for any of the thousands of known NP-complete problems 
is mutually re-enforcing evidence that none are in P. 


+ NP-Complete Satisfiability Problems: Cook’s theorem showed the first 
NP-complete problem — whether a boolean expression is satisfiable — 
by reducing all problems in NP to the SAT problem in polynomial time. 
In addition, the problem remains NP-complete even if the expression is 
restricted to consist of a product of clauses, each of which consists of only 
three literals — the problem 3SAT. 


+ Other NP-Complete Problems: There is a vast collection of known NP- 
complete problems; each is proved NP-complete by a polynomial-time 
reduction from some previously known NP-complete problem. We have 
given reductions that show the following problems NP-complete: inde- 
pendent set, node cover, directed and undirected versions of the Hamilton 
circuit problem, and the traveling-salesman problem. 


10.6 Gradiance Problems for Chapter 10 


The following is a sample of problems that are available on-line through the 
Gradiance system at www.gradiance.com/pearson. Each of these problems 
is worked like conventional homework. The Gradiance system gives you four 
choices that sample your knowledge of the solution. If you make the wrong 
choice, you are given a hint or advice and encouraged to try the same problem 
again. 


Problem 10.1: In the following expressions, - represents negation of a vari- 
able. For example, -x stands for “NOT x”), + represents logical OR, and 
juxtaposition represents logical AND (e.g., (x + y)(y + z) represents 


(x OR y) AND (y OR z) 
Identify the expression that is satisfiable, from the list below. 


Problem 10.2: Suppose there are three languages (i.e., problems), of which 
we know the following: 


e L; isin P. 
e Lə is NP-complete. 
e L is not in NP. 


Suppose also that we do not know anything about the resolution of the “P 
vs. NP” question; for example, we do not know definitely whether P = NP. 
Classify each of the following languages as (a) Definitely in P, (b) Definitely 
in NP (but perhaps not in P and perhaps not NP-complete) (c) Definitely 
NP-complete (d) Definitely not in NP: 
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1. Li U Ly. 
2. Li N Lo. 


3. LecL3, where c is a symbol not in the alphabet of Lə or La (i.e., the 
marked concatenation of Lə and L3, where there is a unique marker symbol 
between the strings from La and L3). 


4. The complement of Lz. 


Based on your analysis, pick the correct, definitely true statement from the list 
below. 


Problem 10.3: The classes of languages P and MP are closed under certain 
operations, and not closed under others, just like classes such as the regular 
languages or context-free languages have closure properties. Decide whether P 
and NP are closed under each of the following operations: 


1. Union. 
. Intersection. 


. Intersection with a regular language. 


. Kleene closure (star). 


2 
3 
4. Concatenation. 
5 
6. Homomorphism. 
7 


. Inverse homomorphism. 
Then, select from the list below the true statement. 


Problem 10.4: The Boolean expression wxyz + u + v is equivalent to an 
expression in 3-CNF (a product of clauses, each clause being the sum of exactly 
three literals). Find the simplest such 3-CNF expression and then identify 
one of its clauses in the list below. Note: -e denotes the negation of e. Also 
note: we are looking for an expression that involves only u, v, w, x, y, and 
z, no other variables. Not all boolean expressions can be converted to 3-CNF 
without introducing new variables, but this one can. 


Problem 10.5: The polynomial-time reduction from SAT to CSAT, as de- 
scribed in Section 10.3.3, needs to introduce new variables. The reason is that 
the obvious manipulation of a boolean expression into an equivalent CNF ex- 
pression could exponentiate the size of the expression, and therefore could not 
be polynomial time. Suppose we apply this construction to the expression 
(u + (vw)) + x, with the parse implied by the parentheses. Suppose also that 
when we introduce new variables, we use y1,42,.-.. After constructing the 
corresponding CNF expression, identify one of its clauses from the list below. 
Note: logical OR is represented by +, logical AND by juxtaposition, and logical 
NOT by -. 
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Problem 10.6: There is a Turing transducer T that transforms problem Pı 
into probem P». T has one read-only input tape, on which an input of length n 
is placed. T has a read-write scratch tape on which it uses O(S(n)) cells. T has 
a write-only output tape, with a head that moves only right, on which it writes 
an output of length O(U(n)). With input of length n, T runs for O(T(n)) time 
before halting. You may assume that each of the upper bounds on space and 
time used are as tight as possible. A given combination of S(n), U(n), and 
T(n) may: 


1. Imply that T is a polynomial-time reduction of P; to P2. 
2. Imply that T is NOT a polynomial-time reduction of P; to Py. 


3. Be impossible; i.e., there is no Turing machine that has that combination 
of tight bounds on the space used, output size, and running time. 


What are all the constraints on S(n), U(n), and T(n) if T is a polynomial- 
time reducer? What are the constraints on feasibility, even if the reduction 
is not polynomial-time? After working out these constraints, identify the true 
statement from the list below. 


Problem 10.7: Use the construction from Theorem 10.15 to convert the fol- 
lowing clauses: 


1. (a+b) 
2. (c+d+e+ f) 
3. (g+h+i+j+k+l+m) 


to clauses with 3 literals per clause. In each case, the new clauses must be 
satisfiable if and only if the original clause is satisfiable. For the first clause, 
introduce variables z1, £2,... in that order from the left; for the second intro- 
duce yi, y2,---. in that order from the left, and for the third introduce 21, z2,... 
in that order from the left. Use -w as shorthand for NOT w. Then identify, in 
the list below, the one clause that would appear among the clauses generated 
by the construction. 


Problem 10.8: The proof that the Independent-Set problem is NP-complete 
depends on a construction given in Theorem 10.18, which reduces 3SAT to 
Independent Sets. Apply this construction to the 3SAT instance: 


(u +v + w)(—v + —w + z)(—u + -z + y)(x + -—y + z)(u + —w + —2z) 


Note that - denotes negation, e.g., -v stands for the literal NOT v. Also, 
remember that the construction involves the creation of nodes denoted [i, j]. 
The node [i,j] corresponds to the jth literal of the ith clause. For example, 
[1,2] corresponds to the occurrence of v. After performing the construction, 
identify from the list below the one pair of nodes that does /bf not have an 
edge between them. 
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Problem 10.9: How large can an independent set be in the graph below 
[shown on-line by the Gradiance system]? Identify one of the maximal inde- 
pendent sets in the list below. 


Problem 10.10: What is the size of a minimal node cover for the graph 
below [shown on-line by the Gradiance system]? Identify one of the minimal 
node covers below. 


Problem 10.11: Find all the minimum-weight Hamilton circuits in the graph 
below [shown on-line by the Gradiance system]: Then, identify in the list below 
the edge that is not on any minimum-weight Hamilton circuit. 


10.7 References for Chapter 10 


The concept of NP-completeness as evidence that the problem could not be 
solved in polynomial time, as well as the proof that SAT, CSAT, and 3SAT are 
NP-complete, comes from Cook [3]. A follow-on paper by Karp [6] is generally 
accorded equal importance, because that paper showed that NP-completeness 
was not just an isolated phenomenon, but rather applied to very many of the 
hard combinatorial problems that people in Operations Research and other 
disciplines had been studying for years. Each of the problems proved NP- 
complete in Section 10.4 are from that paper: independent set, node cover, 
Hamilton circuit, and TSP. In addition, we can find there the solutions to 
several of the problems mentioned in the exercises: clique, edge cover, knapsack, 
coloring, and exact-cover. 

The book by Garey and Johnson [4] summarizes a great deal about what 
is known concerning which problems are NP-complete, and special cases that 
are polynomial-time. In [5] are articles about approximating the solution to an 
NP-complete problem in polynomial time. 

Several other contributions to the theory of NP-completeness should be ac- 
knowledged. The study of classes of languages defined by the running time 
of Turing machines began with Hartmanis and Stearns [8]. Cobham [2] was 
the first to isolate the concept of the class P, as opposed to algorithms that 
had a particular polynomial running time, such as O(n”). Levin [7] was an 
independent, although somewhat later, discovery of the NP-completeness idea. 

NP-completeness of linear integer programming [Exercise 10.4.4(c)] appears 
in [1] and also in unpublished notes of J. Gathen and M. Sieveking. NP- 
completeness of unit-execution-time scheduling [Exercise 10.4.4(g)] is from [9]. 
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Chapter 11 


Additional Classes of 
Problems 


The story of intractable problems does not begin and end with MP. There are 
many other classes of problems that appear to be intractable, or are interest- 
ing for some other reason. Several questions involving these classes, like the 
P=NP question, remain unresolved. 

We shall begin by looking at a class that is closely related to P and NP: the 
class of complements of NP languages, often called “co-NP.” If P = NP, then 
co-NP is equal to both, since P is closed under complementation. However, it 
is likely that co-NV’P is different from both these classes, and in fact likely that 
no NP-complete problem is in co-NP. 

Then, we consider the class PS, which is all the problems that can be solved 
by a Turing machine using an amount of tape that is polynomial in the length of 
its input. These TM’s are allowed to use an exponential amount of time, as long 
as they stay within a limited region of the tape. In contrast to the situation for 
polynomial time, we can prove that nondeterminism doesn’t increase the power 
of the TM when the limitation is polynomial space. However, even though PS 
clearly includes all of NP, we do not know whether PS is equal to NP, or even 
whether it is equal to P. We expect that neither equality is true, however, and 
we give a problem that is complete for PS and appears not to be in NP. 

Then, we turn to randomized algorithms, and two classes of languages that 
lie between P and NP. One is the class RP of “random polynomial” languages. 
These languages have an algorithm that runs in polynomial time, using some 
“coin flipping” or (in practice) a random-number generator. The algorithm 
either confirms membership of the input in the language, or says “I don’t know.” 
Moreover, if the input is in the language, then there is some probability greater 
than 0 that the algorithm will report success, so repeated application of the 
algorithm will, with probability approaching 1, confirm membership. 

The second class, called ZPP (zero-error, probabilistic polynomial), also 
involves randomization. However, algorithms for languages in this class either 
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say “yes” the input is in the language, or “no” it is not. The expected running 
time of the algorithm is polynomial. However, there might be runs of the 
algorithm that take more time than would be allowed by any polynomial bound. 

To tie these concepts together, we consider the important issue of primality 
testing. Many cryptographic systems today rely on both: 


1. The ability to discover large primes quickly (in order to allow communi- 
cation between machines in a way that is not subject to interception by 
an outsider) and 


2. The assumption that it takes exponential time to factor integers, if time 
is measured as a function of the length n of the integer written in binary. 


The complexity of primality testing has long been an open question. On the 
one hand, as we shall show, the problem lies in both MP and in co-NP, and 
therefore is unlikely to be NP-complete. However, until recently, no polynomial- 
time algorithm was known for the problem. There was, however, an elegant and 
practical randomized algorithm, whereby it can be concluded that primaility 
testing is in RP. This ambiguous situation was resolved very recently with the 
discovery of a deterministic, polynomial-time algorithm to test primality. We 
shall only describe the randomized algorithm; it works well in practice and is 
easy to implement, an important requirement in cryptographic systems where 
primality-testing is an important component. 


11.1 Complements of Languages in NP 


The class of languages P is closed under complementation (see Exercise 10.1.6). 
For a simple argument why, let L be in P and let M be a TM for L. Modify 
M as follows, to accept L. Introduce a new accepting state q and have the new 
TM transition to q whenever M halts in a state that is not accepting. Make the 
former accepting states of M be nonaccepting. Then the modified TM accepts 
L, and runs in the same amount of time that M does, with the possible addition 
of one move. Thus, L is in P if L is. 

It is not known whether NP is closed under complementation. It appears 
not, however, and in particular we expect that whenever a language L is NP- 
complete, then its complement is not in NP. 


11.1.1 The Class of Languages Co-NP 


Co-N’P is the set of languages whose complements are in MP. We observed 
at the beginning of Section 11.1 that every language in P has its complement 
also in P, and therefore in NP. On the other hand, we believe that none 
of the NP-complete problems have their complements in NP, and therefore 
no NP-complete problem is in co-MP. Likewise, we believe the complements 
of NP-complete problems, which are by definition in co-MP, are not in NP. 
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Figure 11.1 shows the way we believe the classes P, NP, and co-NP relate. 
However, we should bear in mind that, should P turn out to equal NP, then 
all three classes are actually the same. 


NP-complete problems 


Complements of 
NP-complete problems 


Figure 11.1: Suspected relationship between co-NP and other classes of lan- 
guages 


Example 11.1: Consider the complement of the language SAT, which is surely 
a member of co-NP. We shall refer to this complement as USAT (unsatisfiable). 
The strings in USAT include all those that code boolean expressions that are 
not satisfiable. However, also in USAT are those strings that do not code valid 
boolean expressions, because surely none of those strings are in SAT. We believe 
that USAT is not in VP, but there is no proof. 

Another example of a problem we suspect is in co-NP but not in NP is 
TAUT, the set of all (coded) boolean expressions that are tautologies; i.e., they 
are true for every truth assignment. Note that an expression E is a tautology 
if and only if ~E is unsatisfiable. Thus, TAUT and USAT are related in that 
whenever boolean expression E is in TAUT, ~E is in USAT, and vice-versa. 
However, USAT also contains strings that do not represent valid expressions, 
while all strings in TAUT are valid expressions. 


11.1.2  NP-Complete Problems and Co-NP 


Let us assume that P 4 NP. It is still possible that the situation regarding 
co-NP is not exactly as suggested by Fig. 11.1, because we could have NP and 
co-NP equal, but larger than P. That is, we might discover that problems like 
USAT and TAUT can be solved in nondeterministic polynomial time (i.e., they 
are in NP), and yet not be able to solve them in deterministic polynomial time. 
However, the fact that we have not been able to find even one NP-complete 
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problem whose complement is in MP is strong evidence that NP 4 co-NP, as 
we prove in the next theorem. 


Theorem 11.2: NP = co-NP if and only if there is some NP-complete prob- 
lem whose complement is in NP. 


PROOF: (Only-if) Should NP and co-MP be the same, then surely every NP- 
complete problem L, being in MP, is also in co-VP. But the complement of a 
problem in co-NP is in NP, so the complement of L is in NP. 


(If) Suppose P is an NP-complete problem whose complement P is in MP. 
Then for every language L in MP, there is a polynomial-time reduction of L 
to P. The same reduction also is a polynomial-time reduction of L to P. We 
prove that NP = co-NP by proving containment in both directions. 


NP C co-NP: Suppose L is in NP. Then L is in co-MP. Combine the 
polynomial-time reduction of L to P with the assumed nondeterministic, poly- 
nomial-time algorithm for P to yield a nondeterministic, polynomial-time algo- 
rithm for L. Hence, for any L in NP, L is also in NP. Therefore L, being the 
complement of a language in NP, is in co-NP. This observation tells us that 
NP C co-NP. 


co-NP C NP: Suppose L is in co-NP. Then there is a polynomial-time 
reduction of L to P, since P is NP-complete, and L is in NP. This reduction 
is also a reduction of L to P. Since P is in MP, we combine the reduction 
with the nondeterministic, polynomial-time algorithm for P to show that L is 
in NP. 


11.1.3 Exercises for Section 11.1 


Exercise 11.1.1: Below are some problems. For each, tell whether it is in 
NP and whether it is in co-NP. Describe the complement of each problem. If 
either the problem or its complement is NP-complete, prove that as well. 


* a) The problem TRUE-SAT: given a boolean expression E that is true when 
all the variables are made true, is there some other truth assignment 
besides all-true that makes E true? 


b) The problem FALSE-SAT: given a boolean expression E that is false 
when all its variables are made false, is there some other truth assignment 
besides all-false that makes E false? 

c) The problem DOUBLE-SAT: given a boolean expression E, are there at 
least two truth assignments that make E true? 

d) The problem NEAR-TAUT: given a boolean expression E, is there at 


most one truth assignment that makes E false? 


*! Exercise 11.1.2: Suppose there were a function f that is a one-one function 
from n-bit integers to n-bit integers, such that: 
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1. f(x) can be computed in polynomial time. 
2. f-'(x) cannot be computed in polynomial time. 


Show that the language consisting of pairs of integers (x,y) such that 
F(a) <y 


would then be in (NP N co-NP) — P. 


11.2 Problems Solvable in Polynomial Space 


Now, let us look at a class of problems that includes all of NP, and appears to 
include more, although we cannot be certain it does. This class is defined by 
allowing a Turing machine to use an amount of space that is polynomial in the 
size of its input, no matter how much time it uses. Initially, we shall distinguish 
between the languages accepted by deterministic and nondeterministic TM’s 
with a polynomial space bound, but we shall soon see that these two classes of 
languages are the same. 

There are complete problems P for polynomial space, in the sense that all 
problems in this class are reducible in polynomial time to P. Thus, if P is in 
P or in NP, then all languages with polynomial-space-bounded TM’s are in P 
or NP, respectively. We shall offer one example of such a problem: “quantified 
boolean formulas.” 


11.2.1 Polynomial-Space Turing Machines 


A polynomial-space-bounded Turing machine is suggested by Fig. 11.2. There 
is some polynomial p(n) such that when given input w of length n, the TM 
never visits more than p(n) cells of its tape. By Theorem 8.12, we may assume 
that the tape is semi-infinite, and the TM never moves left from the beginning 
of its input. 

Define the class of languages PS (polynomial space) to include all and only 
the languages that are L(M) for some polynomial-space-bounded, deterministic 
Turing machine M. Also, define the class NPS (nondeterministic polynomial 
space) to consist of those languages that are L(M) for some nondeterministic, 
polynomial-space-bounded TM M. Evidently PS C NPS, since every deter- 
ministic TM is technically nondeterministic also. However, we shall prove the 
surprising result that PS = NPS.! 


lYou may see this class written as PSPACE in other works on the subject. However, 
we prefer to use the script PS to denote the class of problems solved in deterministic (or 
nondeterministic) polynomial space, as we shall drop the use of NPS once the equivalence 
PS = NPS has been proved. 
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control 


—~ input w —> 


n cells 


~— cells ever used = 
p(n) cells 


Figure 11.2: A TM that uses polynomial space 


11.2.2 Relationship of PS and NPS to Previously Defined 
Classes 


To start, the relationships P C PS and NP C NPS should be obvious. The 
reason is that if a TM makes only a polynomial number of moves, then it uses 
no more than a polynomial number of cells; in particular, it cannot visit more 
cells than one plus the number of moves it makes. Once we prove PS = NPS, 
we shall see that in fact the three classes form a chain of containment: P C 
NP CPS. 

An essential property of polynomial-space-bounded TM’s is that they can 
make only an exponential number of moves before they must repeat an ID. We 
need this fact to prove other interesting facts about PS, and also to show that 
PS contains only recursive languages; i.e., languages with algorithms. Note 
that there is nothing in the definition of PS or NPS that requires the TM to 
halt. It is possible that the TM cycles forever, without leaving a polynomial- 
sized region of its tape. 


Theorem 11.3: If M is a polynomial-space-bounded TM (deterministic or 
nondeterministic), and p(n) is its polynomial space bound, then there is a con- 
stant c such that if M accepts its input w of length n, it does so within c!t?™ 
moves. 


PROOF: The essential idea is that M must repeat an ID before making more 
than c!+?(™ moves. If M repeats an ID and then accepts, there must be a 
shorter sequence of ID’s leading to acceptance. That is, if a č B č B č y, 
where a is the initial ID, 8 is the repeated ID, and y is the accepting ID, then 
a Č B Ë y is a shorter sequence of ID’s leading to acceptance. 

The argument that c must exist exploits the fact that there are a limited 
number of ID’s if the space used by the TM is limited. In particular, let t be 
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the number of tape symbols of M, and let s be the number of states of M. 
Then the number of different ID’s of M when only p(n) tape cells are used is 
at most sp(n)t?(), That is, we can choose one of the s states, place the head 
at any of p(n) tape positions, and fill the p(n) cells with any of tP™ sequences 
of tape symbols. 

Pick c = s +t. Then consider the binomial expansion of (t + .s)!+?( , which 
is 

PPN) + (1+ p(n))st?™ +... 

Notice that the second term is at least as large as sp(n)t?™, which proves that. 
c!+P(") is at least equal to the number of possible ID’s of M. We conclude the 
proof by observing that if M accepts w of length n, then it does so by a sequence 
of moves that does not repeat an ID. Therefore, M accepts by a sequence of 
moves that is no longer than the number of distinct ID’s, which is c!+?(). 


We can use Theorem 11.3 to convert any polynomial-space-bounded TM 
into an equivalent one that always halts after making at most an exponential 
number of moves. The essential point is that, since we know the TM accepts 
within an exponential number of moves, we can count how many moves have 
been made, and we can cause the TM to halt if it has made enough moves 
without accepting. 


Theorem 11.4: If L is a language in PS (respectively MPS), then L is ac- 
cepted by a polynomial-space-bounded deterministic (respectively nondeter- 
ministic) TM that halts after making at most c!(” moves, for some polynomial 
q(n) and constant c > 1. 


PROOF: We'll prove the statement for deterministic TM’s; the same argument 
applies to NTM’s. We know L is accepted by a TM M; that has a polynomial 
space bound p(n). Then by Theorem 11.3, if Mı accepts w it does so in at most 
c1+ellwl) steps. 

Design a new TM Mə that has two tapes. On the first tape, Mə simulates 
Mı, and on the second tape, M counts in base c up to c!+2(lwl), If Ma reaches 
this count, it halts without accepting. Mə thus uses 1 + p(|w|) cells on the 
second tape. We also assumed that Mı uses no more than p(|w|) cells on its 
tape, so Mə uses no more than p(|w]) cells on its first tape as well. 

If we convert Mə to a one-tape TM M3, we can be sure that M3 uses no 
more than 1+ p(n) cells of tape, on any input of length n. Although M3 may use 
the square of the running time of Mz, that time is not more than O(c??(™).? 
As M3 makes no more than dc??(™ moves for some constant d, we may pick 
q(n) = 2p(n) + log. d. Then M3 makes at most c1") steps. Since Mz always 
halts, M3 always halts. Since Mı accepts L, so do Mə and M3. Thus, M3 
satisfies the statement of the theorem. 


?In fact, the general rule from Theorem 8.10 is not the strongest claim we can make. 
Because only 1 + p(n) cells are used by any tape, the simulated tape heads in the many- 
tapes-to-one construction can get only 1+ p(n) apart. Thus, c!+P(™) moves of the multitape 


TM Mp can be simulated in O(p(n)cP()) steps, which is less than the claimed O(c?P™)), 
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11.2.3 Deterministic and Nondeterministic Polynomial 
Space 


Since the comparison between P and MP seems so difficult, it is surprising that 
the same comparison between PS and NPS is easy: they are the same classes 
of languages. The proof involves simulating a nondeterministic TM that has 
a polynomial space bound p(n) by a deterministic TM with polynomial space 
bound O(p?(n)). 

The heart of the proof is a deterministic, recursive test for whether a NTM 
N can move from ID I to ID J in at most m moves. A DTM D systematically 
tries all middle ID’s K to check whether 7 can become K in m/2 moves, and 
then K can become J in m/2 moves. That is, imagine there is a recursive 
function reach(I, J,m) that decides if I Č J by at most m moves. 

Think of the tape of D as a stack, where the arguments of the recursive calls 
to reach are placed. That is, in one stack frame D holds [I, J,m]. A sketch of 
the algorithm executed by reach is shown in Fig. 11.3. 


BOOLEAN FUNCTION reach(I,J,m) 
ID: I,J; INT: m; 


BEGIN 
IF (m == 1) THEN /* basis */ BEGIN 
test if I == J or I can become J after one move; 
RETURN TRUE if so, FALSE if not; 
END; 


ELSE /* inductive part */ BEGIN 
FOR each possible ID K DO 
IF (reach(I,K,m/2) AND reach(K,J,m/2)) THEN 
RETURN TRUE; 
RETURN FALSE; 
END; 
END; 


Figure 11.3: The recursive function reach tests whether one ID can become 
another within a stated number of moves 


It is important to observe that, although reach calls itself twice, it makes 
those calls in sequence, and therefore, only one of the calls is active at a time. 
That is, if we start with a stack frame [,J1,m], then at any time there is 
only one call [Io,J2,m/2], one call [J3, J3,m/4], another [I4, J4, m/8], and so 
on, until at some point the third argument becomes 1. At that point, reach 
can apply the basis step, and needs no more recursive calls. It tests if J = J 
or I + J, returning TRUE if either holds and FALSE if neither does. Figure 11.4 
suggests what the stack of the DTM D looks like when there are as many active 
calls to reach as possible, given an initial move count of m. 

While it may appear that many calls to reach are possible, and the tape 
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I, Jm | 1, J, m/2 | 13 J3 my4 | 14 J4 mpg | a 


Figure 11.4: Tape of a DTM simulating a NTM by recursive calls to reach 


of Fig. 11.4 can become very long, we shall show that it cannot become “too 
long.” That is, if started with a move count of m, there can only be log, m 
stack frames on the tape at any one time. Since Theorem 11.4 assures us that 
the NTM N cannot make more than c?(”) moves, m does not have to start 
with a number greater than that. Thus, the number of stack frames is at most 
log c?), which is O(p(n)). We now have the essentials behind the proof of 
the following theorem. 


Theorem 11.5: (Savitch’s Theorem) PS = NPS. 


PROOF: It is obvious that PS C NPS, since every DTM is technically a NTM 
as well. Thus, we need only to show that NPS C PS; that is, if L is accepted 
by some NTM N with space bound p(n), for some polynomial p(n), then L is 
also accepted by some DTM D with polynomial space bound q(n), for some 
other polynomial q(n). In fact, we shall show that q(n) can be chosen to be on 
the order of the square of p(n). 

First, we may assume by Theorem 11.3 that if N accepts, it does so within 
c!+P() steps for some constant c. Given input w of length n, D discovers what 
N does with input w by repeatedly placing the triple |Zo, J, m] on its tape and 
calling reach with these arguments, where: 


1. To is the initial ID of N with input w. 


2. J is any accepting ID that uses at most p(n) tape cells; the different J’s 
are enumerated systematically by D, using a scratch tape. 


3. m=cite(), 


We argued above that there will never be more than log, m recursive calls 
that are active at the same time, i.e., one with third argument m, one with 
m/2, one with m/4, and so on, down to 1. Thus, there are no more than log, m 
stack frames on the stack, and log, m is O(p(n)). 

Further, the stack frames themselves take O (p(n)) space. The reason is that 
the two ID’s each require only 1 + p(n) cells to write down, and if we write m 
in binary, it requires = log, c!+?( cells, which is O(p(n)). Thus, the entire 
stack frame, consisting of two ID’s and an integer, takes O(p(n)) space. 

Since D can have O(p(n)) stack frames at most, the total amount of space 
used is O(p?(n)). This amount of space is a polynomial if p(n) is polynomial, 
so we conclude that L has a DTM that is polynomial-space bounded. 


In summary, we can extend what we know about complexity classes to in- 
clude the polynomial-space classes. The complete diagram is shown in Fig. 11.5. 
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Recursive 


Figure 11.5: Known relationships among classes of languages 


11.3 A Problem That Is Complete for PS 


In this section, we shall introduce a problem called “quantified boolean formu- 
las” and show that it is complete for PS. 


11.3.1 PS-Completeness 
We define a problem P to be complete for PS (PS-complete) if: 


1. Pisin PS. 


2. All languages L in PS are polynomial-time reducible to P. 


Notice that, although we are thinking about polynomial space, not time, the 
requirement for PS-completeness is similar to the requirement for NP-com- 
pleteness: the reduction must be performed in polynomial time. The reason 
is that we want to know that, should some PS-complete problem turn out to 
be in P, then P = PS, and also if some PS-complete problem is in MP, then 
NP = PS. If the reduction were only in polynomial space, then the size of the 
output might be exponential in the size of the input, and therefore we could 
not draw the conclusions of the following theorem. However, since we focus on 
polynomial-time reductions, we get the desired relationships. 


Theorem 11.6: Suppose P is a PS-complete problem. Then: 
a) If Pisin P, then P = PS. 


b) If P is in NP, then NP = PS. 
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PROOF: Let us prove (a). For any L in PS, we know there is a polynomial-time 
reduction of L to P. Let this reduction take time q(n). Also, suppose P is in 
P, and therefore has a polynomial-time algorithm; say this algorithm runs in 
time p(n). 

Given a string w, whose membership in L we wish to test, we can use the 
reduction to convert it to a string x that is in P if and only if w is in L. Since 
the reduction takes time q(|w|), the string x cannot be longer than q(|w|). We 
may test membership of x in P in time p(|z|), which is p(q(|w|)), a polynomial 
in |w|. We conclude that there is a polynomial-time algorithm for L. 

Therefore, every language L in PS is in P. Since containment of P in PS is 
obvious, we conclude that if P is in P, then P = PS. The proof for (b), where 
P isin NP, is quite similar, and we shall leave it to the reader. 


11.3.2 Quantified Boolean Formulas 


We are going to exhibit a problem P that is complete for PS. But first, we need 
to learn the terms in which this problem, called “quantified boolean formulas” 
or QBF, is defined. 

Roughly, a quantified boolean formula is a boolean expression with the 
addition of the operators V (“for all”) and 3 (“there exists”). The expression 
(Vx)(F£) means that E is true when all occurrences of x in E are replaced by 1 
(true), and also true when all occurrences of x are replaced by 0 (false). The 
expression (4z)(/) means that E is true either when all occurrences of x are 
replaced by 1 or when all occurrences of x are replaced by 0, or both. 

To simplify our description, we shall assume that no QBF contains two or 
more quantifications (Y or 5) of the same variable x. This restriction is not 
essential, and corresponds roughly to disallowing two different functions in a 
program from using the same local variable.” Formally, the quantified boolean 
formulas are defined as follows: 


1. 0 (false) , 1 (true), and any variable are QBF’s. 


2. If E and F are QBF’s then so are (E), =(F), (E) A (F), and (E) v (F), 
representing a parenthesized Æ, the negation of E, the AND of E and 
F, and the OR of E and F, respectively. Parentheses may be removed if 
they are redundant, using the usual precedence rules: NOT, then AND, 
then OR (lowest). We shall also tend to use the “arithmetic” style of 
representing AND and OR, where AND is represented by juxtaposition 
(no operator) and OR is represented by +. That is, we often use (E)(F) 
in place of (E) A (F) and use (E) + (F) in place of (E) v (F). 


3. If F is a QBF that does not include a quantification of the variable x, 
then (Vz)(E) and (Sxr)(F) are QBF’s. We say that the scope of x is the 


3We can always rename one of two distinct uses of the same variable name, either in 
programs or in quantified boolean formulas. For programs, there is no reason to avoid reuse 
of the same local name, but in QBF’s we find it convenient to assume there is no reuse. 
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expression Æ. Intuitively, x is only defined within Æ, much as the scope 
of a variable in a program has a scope that is the function in which it 
is declared. Parentheses around E (but not around the quantification) 
can be removed if there is no ambiguity. However, to avoid an excess of 
nested parentheses, we shall write a chain of quantifiers such as 


(va) (Gy) ((¥2)(B)) J 


with only the one pair of parentheses around E, rather than one pair for 
each quantifier on the chain, i.e., as (Vx)(Sy)(Vz)(E). 


Example 11.7: Here is an example of a QBF: 


(Vax) ((Ay) (ay) + (Wz)(n2 + z)) (11.1) 


Starting with the variables x and y, we connect them with AND and then 
apply the quantifier (Ay) to make the subexpression (Sy)(ay). Similarly, we 
construct the boolean expression ~g + z and apply the quantifier (Vz) to make 
the subexpression (Vz)(7a + z). Then, we combine these two expressions with 
an OR; no parentheses are necessary, because + (OR) has lowest precedence. 
Finally, we apply the (Vx) quantifier to this expression to produce the QBF 
stated. 


11.3.3 Evaluating Quantified Boolean Formulas 


We have yet to define formally what the meaning of a QBF is. However, if we 
read V as “for all” and J as “exists,” we can get the intuitive idea. The QBF 
asserts that for all x (i.e., x = 0 or x = 1), either there exists y such that both 
x and y are true, or for all z, nz + z is true. This statement happens to be 
true. To see why, note that if x = 1, then we can pick y = 1 and make zy true. 
if x = 0, then ng + z is true for both values of z. 

If a variable x is in the scope of some quantifier of x, then that use of x is 
said to be bound. Otherwise, an occurrence of x is free. 


Example 11.8: Each use of a variable in the QBF of Equation (11.1) is bound, 
because it is in the scope of the quantifier for that variable. For instance, the 
scope of the variable y, quantified in (Sy)(ay), is the expression xy. Thus, the 
occurrence of y there is bound. The use of x in xy is bound to the quantifier 
(Vx) whose scope is the entire expression. 


The value of a QBF that has no free variables is either 0 or 1 (i.e., false or 
true, respectively). We can compute the value of such a QBF by induction on 
the length n of the expression. 


BASIs: If the expression is of length 1, it can only be a constant 0 or 1, because 
any variable would be free. The value of that expression is itself. 
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INDUCTION: Suppose we are given an expression with no free variables and 
length n > 1, and we can evaluate any expression of shorter length, as long as 
that expression has no free variables. There are six possible forms such a QBF 
can have: 


1. The expression is of the form (E). Then F is of length n — 2 and can be 
evaluated to be either 0 or 1. The value of (E) is the same. 


2. The expression is of the form ~E. Then E is of length n — 1 and can be 
evaluated. If E = 1, then ~E = 0, and vice versa. 


3. The expression is of the form EF. Then both E and F are shorter than 
n, and so can be evaluated. The value of EF is 1 if both E and F have 
the value 1, and EF = 0 if either is 0. 


4. The expression is of the form E + F. Then both E and F are shorter 
than n, and so can be evaluated. The value of E + F is 1 if either E or 
F has the value 1, and E+ F =0 if both are 0. 


5. If the expression is of the form (Vz)(F), first replace all occurrences of x 
in E by 0 to get the expression Eo, and also replace each occurrence of x 
in E by 1, to get the expression E,. Observe that Eo and E both: 


(a) Have no free variables, because any occurrence of a free variable in 
Eo or FE could not be x, and therefore would be some variable that 
is also free in E. 


(b) Have length n — 6, and thus are shorter than n. 


Evaluate Eo and E,. If both have value 1, then (Vxr)(F) has value 1; 
otherwise it has the value 0. Note how this rule reflects the “for all x” 
interpretation of (Vx). 


6. If the given expression is (Ar)(F), then proceed as in (5), constructing 
Fo and E, and evaluating them. If either Eo or Eı has value 1, then 
(Aa)(£) has value 1; otherwise it has value 0. Note that this rule reflects 
the “exists x” interpretation of (Az). 


Example 11.9: Let us evaluate the QBF of Equation (11.1). It is of the form 
(Vx)(F), so we must first evaluate Eo, which is: 


(Ay) (Oy) + (Vz)(70 + 2) (11.2) 


The value of this expression depends on the values of the two expressions con- 
nected by the OR: (Ay)(Oy) and (Vz)(70 + z); Eo has value 1 if either of those 
expressions does. To evaluate (Sy)(Oy), we must substitute y = 0 and y = 1 in 
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subexpression Oy, and check that at least one of them has the value 1. However, 
both 0 A 0 and 0 A 1 have the value 0, so (Ay)(Oy) has value 0.4 

Fortunately, (Vz)(=0 + z) has value 1, as we can see by substituting both 
z = 0 and z = 1. Since 0 = 1, the two expressions we must evaluate are 1 V 0 
and 1 V 1. Since both have value 1, we know that (Vz)(=0+ z) has value 1. We 
now conclude that Fo, which is Equation (11.2), has value 1. 

We must also check that £,, which we get by substituting xz = 1 in Equa- 
tion (11.1): 


(Ay) (Ly) + (Vz)(A1 + 2z) (11.3) 


also has value 1. Expression (Sy)(1y) has value 1, as we can see by substituting 
y = 1. Thus, E, Equation (11.3), has value 1. We conclude that the entire 
expression, Equation (11.1), has value 1. 


11.3.4 PS-Completeness of the QBF Problem 


We can now define the quantified boolean formula problem: Given a QBF with 
no free variables, does it have the value 1? We shall refer to this problem 
as QBF, while continuing also to use QBF as an abbreviation for “quantified 
boolean formula.” The context should allow us to avoid confusion. 

We shall show that the QBF problem is complete for PS. The proof com- 
bines ideas from Theorems 10.9 and 11.5. From Theorem 10.9, we use the idea 
of representing a computation of a TM by logical variables each of which tells 
whether a certain cell has a certain value at a certain time. However, when we 
were dealing with polynomial time, as we were in Theorem 10.9, there were only 
polynomially many variables to concern us. We were thus able to generate, in 
polynomial time, an expression saying that the TM accepted its input. When 
we deal with a polynomial space bound, the number of ID’s in the computation 
can be exponential in the input size, so we cannot, in polynomial time, write 
a boolean expression to say that the computation is correct. Fortunately, we 
are given a more powerful language to express what we need to say, and the 
availability of quantifiers lets us write a polynomial-length QBF that says the 
polynomial-space-bounded TM accepts its input. 

From Theorem 11.5 we use the idea of “recursive doubling” to express the 
idea that one ID can become another in some large number of moves. That is, 
to say that ID J can become ID J in m moves, we say that there exists some 
ID K such that I becomes K in m/2 moves and K becomes J in another m/2 
moves. The language of quantified boolean formulas lets us say these things in 
a polynomial-length expression, even if m is exponential in the length of the 
input. 


4Notice our use of alternative notations for AND and OR, since we cannot use juxtaposition 
and + for expressions involving 0’s and 1’s without making the expressions look either like 
multidigit numbers or arithmetic addition. We hope the reader can accept both notations as 
standing for the same logical operators. 
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Before proceeding to the proof that every language in PS is polynomial- 
time reducible to QBF, we need to show that QBF is in PS. Even this part of 
the PS-completeness proof requires some thought, so we isolate it as a separate 
theorem. 


Theorem 11.10: QBF is in PS. 


PROOF: We discussed in Section 11.3.3 the recursive process for evaluating a 
QBF F. We can implement this algorithm using a stack, which we may store on 
the tape of a Turing machine, as we did in the proof of Theorem 11.5. Suppose 
F is of length n. Then we create a record of length O(n) for F that includes F 
itself and space for a notation about which subexpression of F we are working 
on. Two examples among the six possible forms of F will make the evaluation 
process clear. 


1. Suppose F = Fı + Fy. Then we do the following: 


(a) Place F; in its own record to the right of the record for F. 
) Recursively evaluate F}. 
(c) If the value of F is 1, return the value 1 for F. 
) But if the value of F; is 0, replace its record by a record for F> and 
recursively evaluate F». 


(e) Return as the value of F whatever value F> returns. 
2. Suppose F = (3x)(E). Then do the following: 


(a) Create the expression Eo by substituting 0 for each occurrence of z, 
and place Eg in a record of its own, to the right of the record for F. 


) Recursively evaluate Ep. 

(c) If the value of Ep is 1, then return 1 as the value of F. 

(d) But if the value of Eo is 0, create E, by substituting 1 for x in E. 
) 


Replace the record for Eo by a record for E4, and recursively evaluate 
E. 


(£) Return as the value of F whatever value E; returns. 


We shall leave to you the similar steps that will evaluate F for the cases that 
F is of the other four possible forms: Fı Fo, ~E, (E), or (Yx)(E). The basis 
case, were F is a constant, requires us to return that constant, and no further 
records are created on the tape. 

In any case, we note that to the right of the record for an expression of 
length m will be a record for an expression of length less than m. Note that 
even though we often have to evaluate two different subexpressions, we do so 
one-at-a-time. Thus, in case (1) above, there are never records for both F; or 
any of its subexpressions and F> or its subexpressions on the tape at the same 
time. The same is true of Ho and E; in case (2) above. 
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Therefore, if we start with an expression of length n, there can never be more 
than n records on the stack. Also, each record is O(n) in length. Thus, the 
entire tape never grows longer than O(n?). We now have a construction for a 
polynomial-space-bounded TM that accepts QBF; its space bound is quadratic. 
Note that this algorithm will typically take time that is exponential in n, so it 
is not polynomial-time bounded. 


Now, we turn to the reduction from an arbitrary language L in PS to the 
problem QBF. We would like to use propositional variables y;;4 as we did in 
Theorem 10.9 to assert that the jth position in the ith ID is A. However, since 
there are exponentially many ID’s, we could not take an input w of length n 
and even write down these variables in time that is polynomial in n. Instead, we 
exploit the availability of quantifiers to make the same set of variables represent 
many different ID’s. The idea appears in the proof below. 


Theorem 11.11: The problem QBF is PS-complete. 


PROOF: Let L be in PS, accepted by a deterministic TM M that uses p(n) 
space at most, on input of length n. By Theorem 11.3, we know there is a 
constant c such that M accepts within c!+?(”) moves if it accepts an input of 
length n. We shall describe how, in polynomial time, we take an input w of 
length n and construct from w a QBF E that has no free variables, and has the 
value 1 if and only if w is in L(M). 

In writing Æ, we shall have need to introduce polynomially many variable 
ID’s, which are sets of variables y;4 that assert the jth position of the repre- 
sented ID has symbol A. We allow j to range from 0 to p(n). Symbol A is either 
a tape symbol or state of M. Thus, the number of propositional variables in a 
variable ID is polynomial in n. We assume that all the propositional variables 
in different variable ID’s are distinct; that is, no propositional variable belongs 
to two different variable ID’s. As long as there is only a polynomial number of 
variable ID’s, the total number of propositional variables is polynomial. 

It is convenient to introduce a notation (JI), where I is a variable ID. 
This quantifier stands for (3x1) (3x2): (Arvm), where z1, £2,..., £m are all the 
propositional variables in the variable ID I. Likewise, (YI) stands for the V 
quantifier applied to all the propositional variables in T. 

The QBF we construct for w has the form: 


(GGS A N A F) 
where: 


1. Io and Iş are variable ID’s representing the initial and accepting ID’s, 
respectively. 


2. S is an expression that says “starts right”; i.e., Jo is truly the initial ID 
of M with input w. 


3. N is an expression that says “moves right”; i.e., M takes Ip to Ip. 
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4. F is an expression that says “finishes right”; i.e., J is an accepting ID. 
y 8 f 


Note that, while the entire expression has no free variables, the variables of To 
will appear as free variables in S, the variables of Ip appear free in F, and both 
groups of variables appear free in N. 


Starts Right 


S is the logical AND of literals; each literal is one of the variables of Ip. S 
has literal y;4 if the jth position of the initial ID with input w is A, and has 
literal yya if not. That is, if w = a1a2 an, then Yogo; Yiar; Y2a25- - -s Ynan, and 
all yjg, for j =n+1,n+2,...,p(n) appear without negation, and all other 
variables of Ig are negated. Here, go is assumed to be the initial state of M, 
and B is its blank. 


Finishes Right 


In order for Ip to be an accepting ID, it must have an accepting state. There- 
fore, we write F as the logical OR of those variables y;4, chosen from the 
propositional variables of Ip, for which A is an accepting state. Position j is 
arbitrary. 


Next Move Is Right 


The expression N is constructed recursively in a way that lets us double the 
number of moves considered by adding only O(p(n)) symbols to the expres- 
sion being constructed, and (more importantly) by spending only O(p(n)) time 
writing the expression. It is useful to have the shorthand J = J, where J and 
J are variable ID’s, to stand for the logical AND of expressions that equate 
each of the corresponding variables of J and J. That is, if I consists of vari- 
ables y;4 and J consists of variables z;4, then J = J is the AND of expressions 
(yjazja + (YjA)(Ga)), where j ranges from 0 to p(n), and A is any tape symbol 
or state of M. 

We now construct expressions N;(I, J), for i = 1,2,4,8,--- to mean that 
IF J by i or fewer moves. In these expressions, only the propositional variables 
of variable ID’s J and J are free; all other propositional variables are bound. 


BASIS: For i = 1, N;(I, J) asserts that either J = J, or I + J. We just 
discussed how to express the condition J = J above. For the condition J F J, 
we refer you to the discussion in the “next move is right” portion of the proof of 
Theorem 10.9, where we deal with exactly the same problem of asserting that 
one ID follows from the previous one. The expression N; is the logical OR of 
these two expressions. Note that we can write N; in O(p(n)) time. 


INDUCTION: We construct No;(I, J) from N;. In the box “This Construction 
of Nə; Doesn’t Work” we point out that the direct approach, using two copies 
of N; to build No;, doesn’t give us the time and space bounds we need. The 
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This Construction of Nj; Doesn’t Work 


Our first instinct about constructing No; from N; might be to use a 
straightforward divide-and-conquer approach: if I Ë J in 2i or fewer 
moves, then there must be an ID K such that both T Ë Kan KF Jini 


moves or fewer. However, if we write down the formula that expresses this 
idea, say N2;(I, J) = (AK)(Ni(I, K) A N,(K,J)), we wind up doubling 
the length of the expression as we double i. Since i must be exponential 
in n in order to express all possible computations of M, we would spend 
too much time writing down N, and N would be exponential in length. 


correct way to write No; is to use one copy of N; in the expression, passing both 
the arguments (I, K) and (K, J) to the same expression. That is, No;(7, J) will 
use one subexpression N;(P,Q). We write No;(7, J) to assert that there exists 
ID K such that for all ID’s P and Q, either: 


1. (P,Q) # UK) and (P,Q) 4 (K, J) or 

2. N;(P,Q) is true. 
Put equivalently, N;(I,K) and N;(K,J) are true, and we don’t care about 
whether N;(P,Q) is true otherwise. The following is a QBF for No,(J, J): 

Nal, J) = (AK)(WP)(WQ)(Ni(P,Q) v 
(= PAK =Q) AK =PAJ=Q))) 

Notice that we can write Nə; in the time it takes us to write N;, plus O(p(n)) 
additional work. 


To complete the construction of N, we must construct Nm for the smallest 
m that is a power of 2 and also at least c!+?(), the maximum possible number 
of moves TM M can make before accepting input w of length n. The number 
of times we must apply the inductive step above is log,(c!+?™), or O(p(n)). 
Since each use of the inductive step takes time O(p(n)), we conclude that N 
can be constructed in time O(p?(n)). 


Conclusion of the Proof of Theorem 11.11 
We have now shown how to transform input w into a QBF 
(GGS AN A F) 


in time that is polynomial in |w|. We have also argued why each of the expres- 
sions S, N, and F are true if and only if their free variables represent ID’s Io 


+l 
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and Iş that are respectively the initial and accepting ID’s of a computation of 
M on input w, and also Ip F Iş. That is, this QBF has value 1 if and only if 
M accepts w. 


11.3.5 Exercises for Section 11.3 
Exercise 11.3.1: Complete the proof of Theorem 11.10 by handling the cases: 
a) F= F, Fo. 


b) F = (Yx) (E). 
c) F= (EF). 
d) F = (E). 


Exercise 11.3.2: Show that the following problem is PS-complete. Given 
regular expression E, is E equivalent to &*, where È is the set of symbols that 
appear in E? Hint: Instead of trying to reduce QBF to this problem, it might 
be easier to show that any language in PS reduces to it. For each polynomial- 
space-bounded TM M, show how to take an input w for M and construct in 
polynomial time a regular expression that generates all strings that are not 
sequences of ID’s of M leading to acceptance of w. 


! Exercise 11.3.3: The Shannon Switching Game is as follows. We are given 


a graph G with two terminal nodes s and t. There are two players, which we 
may call SHORT and CUT. Alternately, with SHORT playing first, each player 
selects a vertex of G, other than s and t, which then belongs to that player for 
the rest of the game. SHORT wins by selecting a set of nodes that, with s and t, 
form a path in G from s to t. CUT wins if all the nodes have been selected, and 
SHORT has not selected a path from s to t. Show that the following problem is 
PS-complete: given G, can SHORT win no matter what choices CUT makes? 


11.4 Language Classes Based on Randomization 


We now turn our attention to two classes of languages that are defined by Tur- 
ing machines with the capability of using random numbers in their calculation. 
You are probably familiar with algorithms written in common programming 
languages that use a random-number generator for some useful purpose. Tech- 
nically, the function rand() or similarly named function that returns to you 
what appears to be a “random” or unpredictable number in fact executes a 
specific algorithm that can be simulated, although it is very hard to see a “pat- 
tern” in the sequence of numbers it produces. A simple example of such a 
function (not used in practice) would be a process of taking the previous in- 
teger in the sequence, squaring it, and taking the middle bits of the product. 
Numbers produced by a complex, mechanical process such as this are called 
pseudo-random numbers. 
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In this section, we shall define a type of Turing machine that models the 
generation of random numbers and the use of those numbers in algorithms. We 
then define two classes of languages, RP and ZPP, that use this randomness 
and a polynomial time bound in different ways. Interestingly, these classes 
appear to include little that is not in P, but the differences are important. In 
particular, we shall see in Section 11.5 how some of the most essential matters 
regarding computer security are really questions about the relationship of these 
classes to P and NP. 


11.4.1 Quicksort: an Example of a Randomized 
Algorithm 


You are probably familiar with the sorting algorithm called “Quicksort.” The 
essence of the algorithm is as follows. Given a list of elements a,,a2,...,a@n to 
sort, we pick one of the elements, say a1, and divide the list into those elements 
that are a, or less and those that are larger than a,. The selected element is 
called the pivot. If we are careful with how the data is represented, we can 
separate the list of length n into two lists totaling n in length in time O(n). 
Moreover, we can then recursively sort the list of low (less than or equal to 
the pivot) elements and sort the list of high (greater than the pivot) elements 
independently, and the result will be a sorted list of all n elements. 

If we are lucky, the pivot will turn out to be a number in the middle of the 
sorted list, so the two sublists are each about n/2 in length. If we are lucky at 
each recursive stage, then after about log, n levels of recursion, we shall have 
lists of length 1, and these lists are already sorted. Thus, the total work will be 
O(log n) levels, each with O(n) work required, or O(n log) time overall. 

However, we may not be lucky. For example, if the list happens to be sorted 
to begin with, then picking the first element of each list will divide the list with 
one element in the low sublist and all the rest in the high sublist. If that is the 
case, Quicksort behaves much like Selection-Sort, and takes time proportional 
to n? to sort n elements. 

Thus, good implementations of Quicksort do not take mechanically any 
particular position on the list as the pivot. Rather, the pivot is chosen randomly 
from among all the elements on the list. That is, each of the n elements has 
probability 1/n of being chosen as the pivot. While we shall not show this 
claim here,° it turns out that the expected running time of Quicksort with this 
randomization included is O(nlogn). However, since by the tiniest of chances 
each of the pivot choices could take the largest or smallest element, the worst- 
case running time of Quicksort is still O(n”). Nevertheless, Quicksort is still the 
method of choice in many applications (it is used in the UNIX sort command, 
for example), since its expected running time is really quite good compared with 
other approaches, even with methods that are O(n logn) in the worst case. 


5A proof and analysis of Quicksort’s expected running time can be found in D. E. Knuth, 
The Art of Computer Programming, Vol. IIT: Sorting and Searching, Addison-Wesley, 1973. 
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11.4.2 A Turing-Machine Model Using Randomization 


To represent abstractly the ability of a Turing machine to make random choices, 
much like a program that calls a random-number generator one or more times, 
we shall use the variant of a multitape TM suggested in Fig. 11.6. The first tape 
holds the input, as is conventional for a multitape TM. The second tape also 
begins with nonblanks in its cells. In fact, in principle, its entire tape is covered 
with 0’s and 1’s, each chosen randomly and independently with probability 1/2 
of a 0 and the same probability of a 1. We shall refer to the second tape as 
the random tape. The third and subsequent tapes, if used, are initially blank 
and are used as “scratch tapes” by the TM if needed. We call this TM model 
a randomized Turing machine. 


Finite 
control 


Se 
a r 


Random bits ... 00101000101001000010001111 


Scratch tape(s) 


Figure 11.6: A Turing machine with the capability of using randomly “gener- 
ated” numbers 


Since it may not be realistic to imagine that we initialize the randomized 
TM by covering an infinite tape with random 0’s and 1’s, an equivalent view of 
this TM is that the second tape is initially blank. However, when the second 
head is scanning a blank, an internal “coin flip” occurs, and the randomized 
TM immediately writes either a 0 or a 1 on the tape cell scanned and leaves 
it there forever without change. In that way, there is no work — certainly not 
infinite work — done prior to starting the randomized TM. Yet the second tape 
appears to be covered with random 0’s and 1’s, since those random bits appear 
wherever the randomized TM’s second tape head actually looks. 


Example 11.12: We can implement the randomized version of Quicksort on a 
randomized TM. The important step is the recursive process of taking a sublist, 
which we assume is stored consecutively on the input tape and delineated by 
markers at both ends, picking a pivot at random, and dividing the sublist into 
low and high sub-sublists. The randomized TM does as follows: 
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1. Suppose the sublist to be divided is of length m. Use about O(log m) 
new random bits on the second list to pick a random number between 1 
and m; the mth element of the sublist becomes the pivot. Note that we 
may not be able to choose every integer between 1 and m with absolutely 
equal probability, since m may not be a power of 2. However, if we take, 
say [2log m] bits from tape 2, think of it as a number in the range 0 to 
about m?, take its remainder when divided by m, and add 1, then we shall 
get all numbers between 1 and m with probability that is close enough to 
1/m to make Quicksort work properly. 


2. Put the pivot on tape 3. 


3. Scan the sublist delineated on tape 1, copying those that are no greater 
than the pivot to tape 4. 


4. Again scan the sublist on tape 1, copying those elements greater than the 
pivot to tape 5. 


5. Copy tape 4 and then tape 5 to the space on tape 1 that formerly held 
the delineated sublist. Place a marker between the two lists. 


6. If either or both of the sub-sublists have more than one element, recur- 
sively sort them by the same algorithm. 


Notice that this implementation of Quicksort takes O(n log n) time, even though 
the computing device is a multitape TM, rather than a conventional computer. 
However, the point of this example is not the running time but rather the use 
of the random bits on the second tape to cause random behavior of the Turing 
machine. 


11.4.3 The Language of a Randomized Turing Machine 


We are used to a situation where every Turing machine (or FA or PDA for 
that matter) accepts some language, even if that language is the empty set or 
the set of all strings over the input alphabet. When we deal with randomized 
Turing machines, we need to be more careful about what it means for the TM 
to accept an input, and it becomes possible that a randomized TM accepts no 
language at all. The problem is that when we consider what a randomized TM 
M does in response to an input w, we need to consider M with all possible 
contents for the random tape. It is entirely possible that M accepts with some 
random strings and rejects with others; in fact, if the randomized TM is to do 
anything more efficiently than a deterministic TM, it is essential that different 
contents of the randomized tape lead to different behaviors.® 


6You should be aware that the randomized TM described in Example 11.12 is not a 
language-recognizing TM. Rather, it performs a transformation on its input, and the running 
time of the transformation, although not the outcome, depends on what was on the random 
tape. 
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If we think of a randomized TM as accepting by entering a final state, as 
for a conventional TM, then each input w to the randomized TM M has some 
probability of acceptance, which is the fraction of the possible contents of the 
random tape that lead to acceptance. Since there are an infinite number of 
possible tape contents, we have to be somewhat careful computing this proba- 
bility. However, any sequence of moves leading to acceptance looks at only a 
finite portion of the random tape, so whatever is seen there occurs with a finite 
probability equal to 27™ if m is the number of cells of the random tape that 
have been scanned and influenced at least one move of the TM. An example 
will illustrate the calculation in a very simple case. 


Example 11.13: Our randomized TM M has the transition function displayed 
in Fig. 11.7. M uses only an input tape and the random tape. It behaves in 
a very simple manner, never changing a symbol on either tape, and moving 
its heads only to the right (direction R) or keeping them stationary (direction 
S). Although we have not defined a formal notation for the transitions of a 
randomized TM, the entries in Fig. 11.7 should be understandable; each row 
corresponds to a state, and each column corresponds to a pair of symbols XY, 
where X is the symbol scanned on the input tape, and Y is the symbol scanned 
on the random tape. The entry in the table qUV DE means that the TM enters 
state q, writes U on the input tape, writes V on the random tape, moves the 
input head in direction D, and moves the head of the random tape in direction 


00 01 10 11 BO B1 
—> do q00RS q301SR q.l0RS qggll SR 
qı qı OORS q BOSS 
q2 q2 10RS q BOSS 
q3 qgOORR gll RR q BOSS qa B1SS 
*d4 


Figure 11.7: The transition function of a randomized Turing machine 


Here is a summary of how M behaves on an input string w of 0’s and 1’s. 
In the start state, qo, M looks at the first random bit, and makes one of two 
tests regarding w, depending on whether that random bit is 0 or 1. 

If the random bit is 0, then M tests whether or not w consists of only one 
symbol — 0 or 1. In this case, M looks at no more random bits, but keeps its 
second tape head stationary. If the first bit of w is 0, then M goes to state qı. 
In that state, M moves right over 0’s, but dies if it sees a 1. If M reaches the 
first blank on the input tape while in state q1, it goes to state q4, the accepting 
state. Similarly, if the first bit of w is 1, and the first random bit is 0, then 
M goes to state q2; in that state it checks if all the other bits of w are 1, and 
accepts if so. 

Now, let us consider what M does if the first random bit is 1. It compares 
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w with the second and subsequent random bits, accepting only if they are the 
same. Thus, in state qo, scanning 1 on the second tape, M goes to state q3. 
Notice that when doing so, M moves the random-tape head right, so it gets to 
see a new random bit, while keeping the input-tape head stationary so all of 
w will be compared with random bits. In state q3, M matches the two tapes, 
moving both tape heads right. If it finds a mismatch at some point, it dies and 
fails to accept, while if it reaches the blank on the input tape, it accepts. 

Now, let us compute the probability of acceptance of certain inputs. First, 
consider a homogeneous input, one that consists of only one symbol, such as 0° 
for some i > 1. With probability 1/2, the first random bit will be 0, and if so, 
then the test for homogeneity will succeed, and 0° is surely accepted. However, 
also with probability 1/2 the first random bit is 1. In that case, 0f will be 
accepted if and only if random bits 2 through i + 1 are all 0. That occurs with 
probability 2—*. Thus, the total probability of acceptance of 0° is 

I Wat a SOL (41 
3 + 32 =3 PIRCHI 

Now, consider the case of a heterogeneous input w, i.e., an input that consists 
of both 0’s and 1’s, such as 00101. This input is never accepted if the first 
random bit is 0. If the first random bit is 1, then its probability of acceptance is 
2-*, where i is the length of the input. Thus, the total probability of acceptance 
of a heterogeneous input of length i is 2-'+"). For instance, the probability of 
acceptance of 00101 is 1/64. 


Our conclusion is that we can compute a probability of acceptance of any 
given string by any given randomized TM. Whether or not the string is in the 
language depends on how “membership” in the language of a randomized TM 
is defined. We shall give two different definitions of acceptance in the next 
sections; each leads to a different class of languages. 


11.4.4 The Class RP 


The essence of our first class of languages, called RP, for “random polynomial,” 
is that to be in RP, a language L must be accepted by a randomized TM M 
in the following sense: 


1. If w is not in L, then the probability that M accepts w is 0. 
2. If w is in L, then the probability that M accepts w is at least 1/2. 


3. There is a polynomial T(n) such that if input w is of length n, then all 
runs of M, regardless of the contents of the random tape, halt after at 
most T(n) steps. 


Notice that there are two independent issues addressed by the definition 
of RP. Points (1) and (2) define a randomized Turing machine of a special 
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Nondeterminism and Randomness 


There are some superficial similarities between a randomized TM and a 
nondeterministic TM. We could imagine that the nondeterministic choices 
of a NTM are governed by a tape with random bits, and every time the 
NTM has a choice of moves it consults the random tape and picks from 


among the choices with equal probability. However, if we interpret an 
NTM that way, then the acceptance rule is rather different from the rule 
for RP. Instead, an input is rejected if its probability of acceptance is 
0, and the input is accepted if its probability of acceptance is any value 
greater than 0, no matter how small. 


type, which is sometimes called a Monte-Carlo algorithm. That is, regardless 
of running time, we may say that a randomized TM is “Monte-Carlo” if it either 
accepts with probability 0 or accepts with probability at least 1/2, with nothing 
in between. Point (3) simply addresses the running time, which is independent 
of whether or not the TM is “Monte-Carlo.” 


Example 11.14: Consider the randomized TM of Example 11.13. It surely 
satisfies condition (3), since its running time is O(n) regardless of the contents of 
the random tape. However, it does not accept any language at all, in the sense 
required by the definition of RP. The reason is that, while the homogeneous 
inputs like 000 are accepted with probability at least 1/2, and thus satisfy 
point (2), there are other inputs, like 001, that are accepted with a probability 
that is neither 0 nor at least 1/2; e.g., 001 is accepted with probability 1/16. 


Example 11.15: Let us describe, informally, a randomized TM that is both 
polynomial-time and Monte-Carlo, and therefore accepts a language in RP. 
The input will be interpreted as a graph, and the question is whether the graph 
has a triangle, that is, three nodes all pairs of which are connected by edges. 
Inputs with a triangle are in the language; others are not. 

The Monte-Carlo algorithm will repeatedly pick an edge (x,y) at random 
and pick a node z, other than x and y, at random as well. Each choice is 
determined by looking at some new random bits from the random tape. For 
each x, y, and z selected, the TM tests whether the input holds edges (x, z) 
and (y,z), and if so it declares that the input graph has a triangle. 

A total of k choices of an edge and a node are made; the TM accepts if any 
one of them proves to be a triangle, and if not, it gives up and does not accept. 
If the graph has no triangle, then it is not possible that one of the k choices 
will prove to be a triangle, so condition (1) in the definition of RP is met: if 
the input is not in the language, the probability of acceptance is 0. 
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Suppose the graph has n nodes and e edges. If the graph has at least one 
triangle, then the probability that its three nodes will be selected on any one 
experiment is (2)(—+;). That is, three of the e edges are in the triangle, and 
if any of these three are picked, then the probability is 1/(n — 2) that the 
third node will also be selected. That probability is small, but we repeat the 
experiment k times. The probability that at least one of the k experiments will 


yield the triangle is: 


i= (1 = aaa) (11.4) 


There is a commonly used approximation that says for small x, (1 — x)’ is 
approximately e~**, where e = 2.718--+ is the base of the natural logarithms. 
Thus, if we pick k such that ka = 1, for example, e~** will be significantly less 
than 1/2 and 1 — e~** will be significantly greater than 1/2, about 0.63, to be 
more precise. Thus, we can pick k = e(n — 2)/3 to be sure that the probability 
of acceptance of a graph with a triangle, as given by Equation 11.4, is at least 
1/2. Thus, the algorithm described is Monte-Carlo. 

Now, we must consider the running time of the TM. Both e and n are no 
greater than the input length, and k was chosen to be no more than the square of 
the length, since it is proportional to the product of e and n. Each experiment, 
since it scans the input at most four times (to pick the random edge and node, 
and then to check the presence of two more edges), is linear in the input length. 
Thus, the TM halts after an amount of time that is at most cubic in the input 
length; i.e., the TM has a polynomial running time and therefore satisfies the 
third and final condition for a language to be in RP. 

We conclude that the language of graphs with a triangle is in the class RP. 
Note that this language is also in P, since one could do a systematic search 
of all possibilities for triangles. However, as we mentioned at the beginning of 
Section 11.4, it is actually hard to find examples that appear to be in RP — P. 


11.4.5 Recognizing Languages in RP 


Suppose now that we have a polynomial-time, Monte-Carlo Turing machine M 
to recognize a language L. We are given a string w, and we want to know if 
w isin L. If we run M on L, using coin-flips or some other random-number- 
generating device to simulate the creation of random bits, then we know: 


1. If w is not in L, then our run will surely not lead to acceptance of w. 
2. If w is in L, there is at least a 50% chance that w will be accepted. 


However, if we simply take the outcome of this run to be definitive, we shall 
sometimes reject w when we should have accepted (a false negative result), 
although we shall never accept when we should not (a false positive result). 
Thus, we must distinguish between the randomized TM itself and the algorithm 
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Is Fraction 1/2 Special in the Definition of RP? 


While we defined RP to require that the probability of accepting a string 
w in L should be at least 1/2, we could have defined RP with any constant 
that lies properly between 0 and 1 in place of 1/2. Theorem 11.16 says 
that we could, by repeating the experiment made by M the appropriate 
number of times, make the probability of acceptance as high as we like, 
up to but not including 1. Further, the same technique for decreasing the 
probability of nonacceptance for a string in L that we used in Section 11.4.5 
will allow us to take a randomized TM with any probability greater than 
0 of accepting w in L and boosting that probability to 1/2 by repeating 
the experiment some constant number of times. 

We shall continue to require 1/2 as the probability of acceptance in 
the definition of RP, but we should be aware that any nonzero probability 
is sufficient to use in the definition of the class RP. On the other hand, 
changing the constant from 1/2 will change the language defined by a 
particular randomized TM. For instance, we observed in Example 11.14 
how lowering the required probability to 1/16 would cause string 001 to 
be in the language of the randomized TM discussed there. 


that we use to decide whether or not w is in L. We can never avoid false 
negatives altogether, although by repeating the test many times, we can reduce 
the probability of a false negative to be as small as we like. 

For instance, if we want a probability of false negative of one in a billion, 
we may run the test thirty times. If w is in L, then the chance that all thirty 
tests will fail to lead to acceptance is no greater than 2780, which is less than 
107°, or one in a billion. In general, if we want a probability of false negatives 
less than c > 0, we must run the test log,(1/c) times. Since this number is a 
constant if c is, and since one run of the randomized TM M takes polynomial 
time because L is assumed to be in RP, we know that the repeated test also 
takes a polynomial amount of time. The implication of these considerations is 
stated as a theorem, below. 


Theorem 11.16: If L isin RP, then for any constant c > 0, no matter how 
small, there is a polynomial-time randomized algorithm that renders a decision 
whether its given input w is in L, makes no false-positive errors, and makes 
false-negative errors with probability no greater than c. 


11.4.6 The Class ZPP 


Our second class of languages involving randomization is called zero-error, prob- 
abilistic, polynomial, or ZPP. The class is based on a randomized TM that 
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always halts, and has an expected time to halt that is some polynomial in the 
length of the input. This TM accepts its input if it enters an accepting state 
(and therefore halts at that time), and it rejects its input if it halts without ac- 
cepting. Thus, the definition of class ZPP is almost the same as the definition 
of P, except that ZPP allows the behavior of the TM to involve randomness, 
and the expected running time, rather than the worst-case running time is 
measured. 

A TM that always gives the correct answer, but whose running time varies 
depending on the values of some random bits, is sometimes called a Las- Vegas 
Turing machine or Las-Vegas algorithm. We may thus think of ZPP as the 
languages accepted by Las-Vegas Turing machines with a polynomial expected 
running time. 


11.4.7 Relationship Between RP and ZPP 


There is a simple relationship between the two randomized classes we have 
defined. To state this theorem, we first need to look at the complements of the 
classes. It should be clear that if L is in ZPP, then so is L. The reason is 
that, if L is accepted by a polynomial-expected-time Las-Vegas TM M, then 
L is accepted by a modification of M in which we turn acceptance by M into 
halting without acceptance, and if M halts without accepting, we instead go to 
an accepting state and halt. 

However, it is not obvious that RP is closed under complementation, be- 
cause the definition of Monte-Carlo Turing machines treats acceptance and 
rejection asymmetrically. Thus, let us define the class co-RP to be the set 
of languages L such that Z is in RP; i.e., co-RP is the complements of the 
languages in RP. 


Theorem 11.17: ZPP = RP N co-RP. 


PROOF: We first show RP N co-RP C ZPP. Suppose L is in RP N co-RP. 
That is, both L and Z have Monte-Carlo TM’s, each with a polynomial running 
time. Assume that p(n) is a large enough polynomial to bound the running 
times of both machines. We design a Las-Vegas TM M for L as follows. 


1. Run the Monte-Carlo TM for L; if it accepts, then M accepts and halts. 


2. If not, run the Monte-Carlo TM for L. If that TM accepts, then M halts 
without accepting. Otherwise, M returns to step (1). 


Clearly, M only accepts an input w if w is in L, and only rejects w if w 
is not in L. The expected running time of one round (an execution of steps 1 
and 2) is 2p(n). Moreover, the probability that any one round will resolve the 
issue is at least 1/2. If w is in L, then step (1) has a 50% chance of leading 
to acceptance by M, and if w is not in L, then step (2) has a 50% chance of 
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leading to rejection by M. Thus, the expected running time of M is no more 
than A i i 
2p(n) + 52p(n) + 72P(n) + z2p(n) +: = 4p(n) 

Now, let us consider the converse: assume L is in ZPP and show L is in 
both RP and co-RP. We know L is accepted by a Las-Vegas TM Mı with an 
expected running time that is some polynomial p(n). We construct a Monte- 
Carlo TM Mp for L as follows. Mə simulates M, for 2p(n) steps. If M, accepts 
during this time, so does Mə; otherwise M2 rejects. 

Suppose that input w of length n is not in L. Then Mı will surely not accept 
w, and therefore neither will Mə. Now, suppose w isin L. Mı will surely accept 
w eventually, but it might or might not accept within 2p(n) steps. 

However, we claim that the probability Mı accepts w within 2p(n) steps is 
at least 1/2. Suppose the probability of acceptance of w by Mı within time 
2p(n) were constant c < 1/2. Then the expected running time of Mı on input w 
is at least (1—c)2p(n), since 1—c is the probability that Mı will take more than 
2p(n) time. However, if c < 1/2, then 2(1 — c) > 1, and the expected running 
time of Mı on w is greater than p(n). We have contradicted the assumption 
that Mı has expected running time at most p(n) and conclude therefore that the 
probability Mə accepts is at least 1/2. Thus, Mə is a polynomial-time-bounded 
Monte-Carlo TM, proving that L is in RP. 

For the proof that L is also in co-RP, we use essentially the same construc- 
tion, but we complement the outcome of Mə. That is, to accept L, we have Mə 
accept when Mı rejects within time 2p(n), while Mə rejects otherwise. Now, 
Mp is a polynomial-time-bounded Monte-Carlo TM for L. 


11.4.8 Relationships to the Classes P and NP 


Theorem 11.17 tells us that ZPP C RP. We can place these classes between 
P and NP by the following simple theorems. 


Theorem 11.18: P C ZPP. 


PROOF: Any deterministic, polynomial-time bounded TM is also a Las-Vegas, 
polynomial-time bounded TM, that happens not to use its ability to make 
random choices. 


Theorem 11.19: RP CNP. 


PROOF: Suppose we are given a polynomial-time-bounded Monte-Carlo TM 
My, for a language L. We can construct a nondeterministic TM Mo for L with 
the same time bound. Whenever Mı examines a random bit for the first time, 
Mp2 chooses, nondeterministically, both possible values for that bit, and writes 
it on a tape of its own that simulates the random tape of Mı. Mə accepts 
whenever Mı accepts, and does not accept otherwise. 

Suppose w is in L. Then since Mı has at least a 50% probability of ac- 
cepting w, there must be some sequence of bits on its random tape that leads 
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to acceptance of w. Mə will choose that sequence of bits, among others, and 
therefore also accepts when that choice is made. Thus, w is in L(M2). However, 
if w is not in L, then no sequence of random bits will make Mı accept, and 
therefore no sequence of choices makes Mə accept. Thus, w is not in L(Mə). 


Figure 11.8 shows the relationship between the classes we have introduced 
and the other “nearby” classes. 


COD) 


Figure 11.8: Relationship of ZPP and RP to other classes 


11.5 The Complexity of Primality Testing 


In this section, we shall look at a particular problem: testing whether an integer 
is a prime. We begin with a motivating discussion concerning the way primes 
and primality testing are essential ingredients in computer-security systems. 
We then show that the primes are in both NP and co-NP. Finally, we discuss 
a randomized algorithm that shows the primes are in RP as well. 


11.5.1 The Importance of Testing Primality 


An integer p is prime if the only integers that divide p evenly are 1 and p itself. 
If an integer is not a prime, it is said to be composite. Every composite number 
can be written as a product of primes in a unique way, except for the order of 
the factors. 


Example 11.20: The first few primes are 2, 3, 5, 7, 11, 13, and 17. The 
integer 504 is composite, and its prime factorization is 23 x 3? x 7. 
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There are a number of techniques that enhance computer security, for which 
the most common methods in use today rely on the assumption that it is hard 
to factor numbers, that is, given a composite number, to find its prime factors. 
In particular, these schemes, based on what are called RSA codes (for R. Rivest, 
A. Shamir, and L. Adelman, the inventors of the technique), use integers of, 
say, 128 bits that are the product of two primes, each of about 64 bits. Here 
are two scenarios in which primes play an important part. 


Public-Key Cryptography 


You want to buy a book from an on-line bookseller. The seller asks for your 
credit-card number, but it is too risky to type the number into a form and 
have the form transmitted over phone lines or the Internet. The reason is that 
someone could be snooping on your line, or otherwise intercept packets as they 
travel over the Internet. 

To avoid a snooper being able to read your card number, the seller sends 
your browser a key k, perhaps the 128-bit product of two primes that the 
seller’s computer has generated just for this purpose. Your browser uses a 
function y = f(x) that takes both the key k and the data x that you need to 
encrypt. The function f, which is part of the RSA scheme, may be generally 
known, including to potential snoopers, but it is believed that without knowing 
the factorization of k, the inverse function f; ' such that « = f(y) cannot be 
computed in time that is less than exponential in the length of k. 

Thus, even if a snooper sees y and knows how f works, without first figuring 
out what k is and then factoring it, the snooper cannot recover x, which is in this 
case your credit-card number. On the other hand, the on-line seller, knowing 
the factorization of key k because they generated it in the first place, can easily 
apply fy 1 and recover x from y. 


Public-Key Signatures 


The original scenario for which RSA codes were developed is the following. 
You would like to be able to “sign” email so that people could easily determine 
that the email was from you, and yet no one could “forge” your name to an 
email. For instance, you might wish to sign the message x = “I promise to 
pay Sally Lee $10,” but you don’t want Sally to be able to create the signed 
message herself, or for a third party to create such a signed message without 
your knowledge. 

To support these aims, you pick a key k, whose prime factors only you know. 
You publish k widely, say on your Web site, so anyone can apply the function 
fk to any message. If you want to sign the message x above and send it to 
Sally, you compute y = f, '(x) and send y to Sally instead. Sally can get fp, 
your public key, from your Web site, and with it compute x = f(y). Thus, she 
knows that you have indeed promised to pay $10. 

If you deny having sent the message y, Sally can argue before a judge that 
only you know the function fp 1 and it would be “impossible” for either her or 
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any third party to have discovered that function. Thus, only you could have 
created y. This system relies on the likely-but-unproven assumption that it is 
too hard to factor numbers that are the product of two large primes. 


Requirements Regarding Complexity of Primality Testing 


Both scenarios above are believed to work and to be secure, in the sense that 
it really does take exponential time to factor the product of two large primes. 
The complexity theory we have studied here and in Chapter 10 enter into the 
study of security and cryptography in two ways: 


1. The construction of public keys requires that we be able to find large 
primes quickly. It is a basic fact of number theory that the probability of 
an n-bit number being a prime is on the order of 1/n. Thus, if we had 
a polynomial-time (in n, not in the value of the prime itself) way to test 
whether an n-bit number was prime, we could pick numbers at random, 
test them, and stop when we found one to be prime. That would give 
us a polynomial-time Las-Vegas algorithm for discovering primes, since 
the expected number of numbers we have to test before meeting a prime 
of n bits is about n. For instance, if we want 64-bit primes, we would 
have to test about 64 integers on the average, although by bad luck we 
could have to try indefinitely more than that. Unfortunately, the recently 
discovered polynomial-time time test for primes is not yet efficient enough 
to be used in practice. However, there is a Monte-Carlo Algorithm that 
is polynomial-time, as we shall see in Section 11.5.4. 


2. The security of RSA-based cryptography depends on there being no poly- 
nomial (in the number of bits of the key) way to factor in general, in 
particular no way to factor a number known to be the product of exactly 
two large primes. We would be very happy if we could show that the set 
of primes is an NP-complete language, or even that the set of composite 
numbers was NP-complete. For then, a polynomial factoring algorithm 
would prove P = NP, since it would yield polynomial-time tests for both 
these languages. Alas, as we remarked earlier, after several decades of 
research there is now a definite proof that testing primes is a problem 
that lies in P. 


11.5.2 Introduction to Modular Arithmetic 


Before looking at algorithms for recognizing the set of primes, we shall introduce 
some basic concepts regarding modular arithmetic, that is, the usual arithmetic 
operations executed modulo some integer, often a prime. Let p be any integer. 
The integers modulo p are 0,1,...,p—1. 

We can define addition and multiplication modulo p to apply only to this 
set of p integers by performing the ordinary calculation and then computing the 
remainder when the result is divided by p. Addition is quite straightforward, 


11.5. THE COMPLEXITY OF PRIMALITY TESTING 515 


since the sum is either less than p, in which case we have nothing additional to 
do, or it is between p and 2p — 2, in which case we subtract p to get an integer 
in the range 0,1,...,p — 1. Modular addition obeys the usual algebraic laws; 
it is commutative, associative, and has 0 as the identity. Subtraction is still 
the inverse of addition, and we can compute the modular difference x — y by 
subtracting as usual, and adding p if the result is below 0. The negation of z, 
which is —zx, is the same as 0 — z, just as in ordinary arithmetic. Thus, —0 = 0, 
and if x Æ 0, then —zx is the same as p — z. 


Example 11.21: Suppose p = 13. Then 3+5 = 8, and 7 + 10 = 4. To see the 
latter, note that in ordinary arithmetic, 7 + 10 = 17, which is not less than 13. 
We therefore subtract 13 to get the proper result, 4. The value of —5 modulo 
13 is 13 — 5, or 8. The difference 11 — 4 modulo 13 is 7, while the difference 
4—11is 6. To see the latter, in ordinary arithmetic, 4 — 11 = —7, so we must 
add 13 to get 6. 


Multiplication modulo p is performed by multiplying as ordinary numbers, 
and then taking the remainder of the result divided by p. Multiplication also 
satisfies the usual algebraic laws; it is commutative and associative, 1 is the iden- 
tity, 0 is the annihilator, and multiplication distributes over addition. However, 
division by nonzero values is trickier, and even the existence of inverses for in- 
tegers modulo p depends on whether or not p is a prime. In general, if x is one 
of the integers modulo p, that is, 0 < x < p, then a7!, or 1/z is that number 
y, if it exists, such that zy = 1 modulo p. 


Figure 11.9: Multiplication modulo 7 


Example 11.22: In Fig. 11.9 we see the multiplication table for the nonzero 
integers modulo the prime 7. The entry in row 7 and column j is the product 
ij modulo 7. Notice that each of the nonzero integers has an inverse; 2 and 4 
are each other’s inverses, so are 3 and 5, while 1 and 6 are their own inverses. 
That is, 2 x 4, 3 x 5, 1 x 1, and 6 x 6 are all 1. Thus, we can divide x by 
any nonzero number y by computing y~! and then multiplying x x y~!. For 
instance, 3/4 = 3 x 47! =3 x2 = 6. 

Compare this situation with the multiplication table modulo 6. First, we 
observe that only 1 and 5 even have inverses; they are each their own inverse. 
Other numbers have no inverse. In addition, there are numbers that are not 
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Figure 11.10: Multiplication modulo 6 


0, but whose product is 0, such as 2 and 3. That situation never occurs for 
ordinary integer arithmetic, and it never happens when arithmetic is modulo a 
prime. 


There is another distinction between multiplication modulo a prime and 
modulo a composite number that turns out to be quite important for primality 
tests. The degree of a number a modulo p is the smallest positive power of a 
that is equal to 1. Some useful facts, which we shall not prove here are: 


e If pisa prime, then a?—! = 1 modulo p. This statement is called Fermat’s 
theorem.” 


e The degree of a modulo a prime p is always a divisor of p — 1. 


e If pis a prime, there is always some a that has degree p — 1 modulo p. 


Example 11.23: Consider again the multiplication table modulo 7 in Fig. 
11.9. The degree of 2 is 3, since 2? = 4, and 2? = 1. The degree of 3 is 6, since 
3? = 2, 33 = 6, 34 = 4, 3° = 5, and 3° = 1. By similar calculations, we find 
that 4 has degree 3, 5 has degree 6, 6 has degree 2, and 1 has degree 1. 


11.5.3 The Complexity of Modular-Arithmetic 
Computations 


Before proceeding to the applications of modular arithmetic to primality testing, 
we must establish some basic facts about the running time of the essential 
operations. Suppose we wish to compute modulo some prime p, and the binary 
representation of p is n bits long; i.e., p itself is around 2”. As always, the 
running time of a computation is stated in terms of n, the input length, rather 
than p, the “value” of the input. For instance, counting up to p takes time 
O(2”), so any computation that involves p steps, will not be polynomial-time, 
as a function of n. 

However, we can surely add two numbers modulo p in O(n) time on a typical 
computer or multitape TM. Recall that we simply add the binary numbers, 
and if the result is p or greater, then subtract p. Likewise, we can multiply 


TDo not confuse Fermat’s theorem with “Fermat’s last theorem,” which asserts the nonex- 
istence of integer solutions to x” + y” = z” for n > 3. 
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two numbers in O(n?) time, either on a computer or a Turing machine. After 
multiplying the numbers in the ordinary way, and getting a result of at most 
2n bits, we divide by p and take the remainder. 

Raising a number x to an exponent is trickier, since that exponent may itself 
be exponential in n. As we shall see, an important step is raising x to the power 
p—1. Since p—1 is around 2”, if we were to multiply æ by itself p — 2 times, we 
would need O(2”) multiplications, and even though each multiplication involved 
only n-bit numbers and could be carried out in O(n”) time, the total time would 
be O(n?2"), which is not polynomial in n. 

Fortunately, there is a “recursive-doubling” trick that lets us compute x?~! 
(or any other power of x up to p) in time that is polynomial in n: 

1. Compute the at most n exponents zx, x°, xt, x8,... , until the exponent 
exceeds p — 1. Each value is an n-bit number that is computed in O(n?) 
time by squaring the previous value in the sequence, so the total work is 
O(n?). 


2. Find the binary representation of p— 1, say p—1 = an_1--+-a,a9. We can 
write 
p—1= ao + 2a; +409 +- +2" api 


where each a; is either 0 or 1. Therefore, 


p—1 _ xe +2a1+4aete-+2" T an1 


which is the product of those values z” 


Qi, 


for which a; = 1. Since we 
computed each of those z* ’s in step (1), and each is an n-bit number, we 
can compute the product of these n or fewer numbers in O(n?) time. 


Thus, the entire computation of z?~! takes O(n?) time. 


11.5.4 Random-Polynomial Primality Testing 


We shall now discuss how to use randomized computation to find large prime 
numbers. More precisely, we shall show that the language of composite numbers 
isin RP. The method actually used to generate n-bit primes is to pick an n-bit 
number at random and apply the Monte-Carlo algorithm to recognize composite 
numbers some large number of times, say 50. If any test says that the number 
is composite, then we know it is not a prime. If all 50 fail to say that it is 
composite, there is no more than 275° probability that it really is composite. 
Thus, we can fairly safely say that the number is prime and base our secure 
operation on that fact. 

We shall not give the complete algorithm here, but rather discuss an idea 
that works except in a very small number of cases. Recall Fermat’s theorem 
tells us that if p is a prime, then x?~! modulo p is always 1. It is also a fact 
that if p is a composite number, and there is any z at all for which 2?~! modulo 
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Can We Factor in Random Polynomial Time? 


Notice that the algorithm of Section 11.5.4 may tell us that a number is 
composite, but does not tell us how to factor the composite number. It is 
believed that there is no way to factor numbers, even using randomness, 
that takes only polynomial time, or even expected polynomial time. If 
that assumption were incorrect, then the applications that we discussed 
in Section 11.5.1 would be insecure and could not be used. 


pis not 1, then for at least half the values of x in the range 1 to p-— 1, we shall 
find z?™! 41 modulo p. 
Thus, we shall use as our Monte-Carlo algorithm for the composite numbers: 


1. Pick an x at random in the range 1 to p- 1. 


2. Compute z?—! modulo p. Note that if p is an n-bit number, then this 
calculation takes O(n?) time by the discussion at the end of Section 11.5.3. 


3. If z?! Æ 1 modulo p, accept; x is composite. Otherwise, halt without 
accepting. 


If p is prime, then 2?—! = 1, so we always halt without accepting; that is one 
part of the Monte-Carlo requirement, that if the input is not in the language, 
then we never accept. For almost all the composite numbers, at least half the 
values of x will have x?—! Æ 1, so we have at least 50% chance of acceptance on 
any one run of this algorithm; that is the other requirement for an algorithm 
to be Monte-Carlo. 

What we have described so far would be a demonstration that the compos- 
ite numbers are in RP, if it were not for the existence of a small number of 
composite numbers c that have #°~! = 1 modulo c, for the majority of x in 
the range 1 to c — 1, in particular for those x that do not share a common 
prime factor with c. These numbers, called Carmichael numbers, require us to 
do another, more complex test (which we do not describe here) to detect that 
they are composite. The smallest Carmichael number is 561. That is, one can 
show z860 = 1 modulo 561 for all x that are not divisible by 3, 11, or 17, even 
though 561 = 3 x 11 x 17 is evidently composite. Thus, we shall claim, but 
without a complete proof, that: 


Theorem 11.24: The set of composite numbers is in RP. 


11.5.5 Nondeterministic Primality Tests 


Let us now take up another interesting and significant result about testing pri- 
mality: that the language of primes is in VP N co-NP. Therefore the language 
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of composite numbers, the complement of the primes, is also in NP N co-NP. 
The significance of this fact is that it is unlikely to be the case that the primes 
or the composite numbers are NP-complete, for if either were true then we 
would have the unexpected equality MP = co-NP. This observation had moti- 
vated several decades of research attempting to find a polynomial-time test for 
primality, culminating in the recent discovery of such an algorithm. 

One part is easy: the composite numbers are obviously in VP, so the primes 
are in co-NP. We prove that fact first. 


Theorem 11.25: The set of composite numbers is in MP. 


PROOF: The nondeterministic, polynomial-time algorithm for the composite 
numbers is: 


1. Given an n-bit number p, guess a factor f of at most n bits. Do not choose 
f =1or f =p, however. This part is nondeterministic, with all possible 
values of f being guessed along some sequence of choices. However, the 
time taken by any sequence of choices is O(n). 


2. Divide p by f, and check that the remainder is 0. Accept if so. This part 
is deterministic and can be carried out in time O(n”) on a multitape TM. 


If p is composite, then it must have at least one factor f other than 1 and p. 
The NTM, since it guesses all possible numbers of up to n bits, will in some 
branch guess f. That branch leads to acceptance. Conversely, acceptance by 
the NTM implies that a factor of p other than 1 or p itself has been found. 
Thus, the NTM described accepts the language consisting of all and only the 
composite numbers. 


Recognizing the primes with a NTM is harder. While we were able to 
guess a reason (a factor) that a number is not a prime, and then check that 
our guess is correct, how do we “guess” a reason a number is a prime? The 
nondeterministic, polynomial-time algorithm is based on the fact (asserted but 
not proved) that if p is a prime, then there is a number x between 1 and p—1 
that has degree p — 1. For instance, we observed in Example 11.23 that for the 
prime p = 7, the numbers 3 and 5 both have degree 6. 

While we could guess a number g easily, using the nondeterministic capa- 
bility of a NTM, it is not immediately obvious how one then checks that x has 
degree p — 1. The reason is that if we apply the definition of “degree” directly, 
we need to check that none of z?,2°,...,2?-? is 1. To do so requires that we 
perform p— 3 multiplications, and that requires time at least 2”, if p is an n-bit 
number. 

A better strategy is to make use of another fact that we assert but do not 
prove: the degree of x modulo a prime p is a divisor of p — 1. Thus, if we knew 
the prime factors of p — 1,8 it would be sufficient to check that 2@-)/4 Æ 1 for 


8Notice that if p is a prime, then p — 1 is never a prime, except in the uninteresting case 
p = 3. The reason is that all primes but 2 are odd. 
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each prime factor q of p — 1. If none of these powers of x is equal to 1, then 
the degree of x must be p—1. The number of these tests is O(n), so we can 
perform them all in a polynomial-time algorithm. Of course we cannot factor 
p-— l into primes easily. However, nondeterministically we can guess the prime 
factors of p — 1, and: 


a) Check that their product is indeed p — 1. 


b) Check that each is a prime, using the nondeterministic, polynomial-time 
algorithm that we have been designing, recursively. 


The details of the algorithm, and the proof that it is nondeterministic, poly- 
nomial-time, are in the proof of the theorem below. 


Theorem 11.26: The set of primes is in MP. 


PROOF: Given a number p of n bits, we do the following. First, if n is no more 
than 2 (i.e., pis 1, 2, or 3), answer the question directly; 2 and 3 are primes, 
while 1 is not. Otherwise: 


1. Guess a list of factors (q1,q2,...,qg), whose binary representations total 
at most 2n bits, and none of which has more than n — 1 bits. It is 
permitted for the same prime to appear several times, since p — 1 may 
have a factor that is a prime raised to a power greater than 1; e.g., if 
p = 13, then the prime factors of p — 1 = 12 are in the list (2,2,3). This 
part is nondeterministic, but each branch takes O(n) time. 


2. Multiply the q’s together, and verify that their product is p—1. This part 
takes no more than O(n?) time and is deterministic. 


3. If their product is p— 1, recursively verify that each is a prime, using the 
algorithm being described here. 


4. If the q’s are all prime, guess a value of x and check that «?~!)/%) Æ 1 for 
any of the q;’s. This test assures that x has degree p— 1 modulo p, since if 
it did not, then its degree would have to divide at least one (p—1)/q;, and 
we just verified that it did not. Note in justification that any x, raised to 
any power of its degree, must be 1. The exponentiations can be done by 
the efficient method described in Section 11.5.3. Thus, there are at most 
k exponentiations, which is surely no more than n exponentiations, and 
each one can be performed in O(n?) time, giving us a total time of O(n‘) 
for this step. 


Lastly, we must verify that this nondeterministic algorithm is polynomial- 
time. Each of the steps except the recursive step (3) takes time at most O(n‘) 
along any nondeterministic branch. While this recursion is complicated, we can 
visualize the recursive calls as a tree suggested by Fig. 11.11. At the root is 
the prime p of n bits that we want to verify. The children of the root are the 
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q;’8, which are the guessed factors of p — 1 that we must also verify are primes. 
Below each q; are the guessed factors of q; — 1 that we must verify, and so on, 
until we get down to numbers of at most 2 bits, which are leaves of the tree. 


Root level p 
Bee R 
q: 


Level 1 1 q, wees l, 


Level 2 J \ / \ 
T 


Figure 11.11: The recursive calls made by the algorithm of Theorem 11.26 form 
a tree of height and width at most n 


Since the product of the children of any node is less than the value of the 
node itself, we see that the product of the values of nodes at any depth from the 
root is at most p. The work required at a node with value i, exclusive of work 
done in recursive calls, is at most a(log,i)* for some constant a; the reason is 
that we determined this work to be on the order of the fourth power of the 
number of bits needed to represent that value in binary. 

Thus, to get an upper bound on the work required by any one level, we must 
maximize the sum J`; a (logs i)“, subject to the constraint that the product 
iii2 +- is at most p. Because the fourth power is convex, the maximum occurs 
when all of the value is in one of the 7,’s. If 7; = p, and there are no other 2,’s, 
then the sum is a(log, p)*. That is at most an*, since n is the number of bits 
in the binary representation of p, and therefore log, p is at most n. 

Our conclusion is that the work required at each depth is at most O(n*). 
Since there are at most n levels, O(n) work suffices in any branch of the 
nondeterministic test for whether p is prime. 


Now we know that both the primes and their complement are in NP. If 
either were NP-complete, then by Theorem 11.2 we would have a proof that 


NP = co-NP. 
11.5.6 Exercises for Section 11.5 
Exercise 11.5.1: Compute the following modulo 13: 


a) 114+ 9. 


* b) 9-11. 


c) 5x8. 


) 
) 
) 
) 


* d) 5/8. 
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e) 58. 


Exercise 11.5.2: We claimed in Section 11.5.4 that for most values of x be- 
tween 1 and 560, x°®° = 1 modulo 561. Pick some values of x and verify that 
equation. Be sure to express 560 in binary first, and then compute z?” modulo 
561, for various values of 7, to avoid doing 559 multiplications, as we discussed 
in Section 11.5.3. 


Exercise 11.5.3: An integer x between 1 and p — 1 is said to be a quadratic 
residue modulo p if there is some integer y between 1 and p—1 such that y? = z. 


* a) What are the quadratic residues modulo 7? You may use the table of 


SO AS 


Fig. 11.9 to help answer the question. 
What are the quadratic residues modulo 13? 


Show that if p is a prime, then the number of quadratic residues modulo p 
is (p—1)/2; i.e., exactly half the nonzero integers modulo p are quadratic 
residues. Hint: Examine your data from parts (a) and (b). Do you see 
a pattern explaining why every quadratic residue is the square of two 
different numbers? Could one integer be the square of three different 
numbers when p is a prime? 


11.6 Summary of Chapter 11 


+ The Class co-NP: A language is said to be in co-NP if its complement 


is in MP. All languages in P are surely in co-MP, but it is likely that 
there are some languages in MP that are not in co-NP, and vice-versa. 
In particular, the NP-complete problems do not appear to be in co-NP. 


The Class PS: A language is said to be in PS (polynomial space) if it 
is accepted by a deterministic TM for which there is a polynomial p(n) 
such that on input of length n the TM never uses more than p(n) cells of 
its tape. 


The Class NPS: We can also define acceptance by a nondeterministic 
TM whose tape-usage is limited by a polynomial function of its input 
length. The class of these languages is referred to as NPS. However, 
Savitch’s theorem tells us that PS = NPS. In particular, a NTM with 
space bound p(n) can be simulated by a DTM using space p?(n). 


Randomized Algorithms and Turing Machines: Many algorithms use ran- 
domness productively. On a real computer, a random-number generator 
is used to simulate “coin-flipping.” A randomized Turing machine can 
achieve the same random behavior if it is given an additional tape on 
which a sequence of random bits is written. 
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+ The Class RP: A language is accepted in random polynomial time if 
there is a polynomial-time, randomized Turing machine that has at least 
50% chance of accepting its input if that input is in the language. If the 
input is not in the language, then this TM never accepts. Such a TM or 
algorithm is called “Monte-Carlo.” 


+ The Class ZPP: A language is in the class of zero-error, probabilistic 
polynomial time if it is accepted by a randomized Turing machine that 
always gives the correct decision regarding membership in the language; 
this TM must run in expected polynomial time, although the worst case 
may be greater than any polynomial. Such a TM or algorithm is called 
“Las Vegas.” 


+ Relationships Among Language Classes: The class co-RP is the set of 
complements of languages in RP. The following containments are known: 
P C ZPP C (RP N co-RP). Also, RP C NP and therefore co-RP C 
co-NP. 


+ The Primes and NP: Both the primes and the complement of the lan- 
guage of primes — the composite numbers — are in MP. These facts 
make it unlikely that the primes or composite numbers are NP-complete. 
Since there are important cryptographic schemes based on primes, such a 
proof would have offered strong evidence of their security. 


+ The Primes and RP: The composite numbers are in RP. The random- 
polynomial algorithm for testing compositeness is in common use to allow 
the generation of large primes, or at least large numbers that have an 
arbitrarily small chance of being composite. 


11.7 Gradiance Problems for Chapter 11 


The following is a sample of problems that are available on-line through the 
Gradiance system at www.gradiance.com/pearson. Each of these problems 
is worked like conventional homework. The Gradiance system gives you four 
choices that sample your knowledge of the solution. If you make the wrong 
choice, you are given a hint or advice and encouraged to try the same problem 
again. 


Problem 11.1: In the diagram [shown on-line by the Gradiance system, and 
illustrating the classes P, NP, co-NP, PS, NPS, and recursive] we see certain 
complexity classes (represented as circles or ovals) and certain regions labeled 
A through F that represent the differences of some of these complexity classes. 
The state of our knowledge regarding the existence of problems in the regions 
A-F is imperfect. In some cases, we know that a region is nonempty, and in 
other cases we know that it is empty. Moreover, if P = NP, then we would 
know more about the emptiness or nonemptiness of some of these regions, but 
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still would not know everything. Decide what we know about the regions A-F 
currently, and also what we would know if P = NP. Then, identify the true 
statement from the list below. 


Problem 11.2: Consider the following problems: 


1. SP (Shortest Paths): given a weighted, undirected graph with nonnegative 
integer edge weights, given two nodes in that graph, and given an integer 
limit k, determine whether the length of the shortest path between the 
nodes is k or less. 


2. WHP (Weighted Hamilton Paths): given a weighted, undirected graph 
with nonnegative integer edge weights, and given an integer limit k, de- 
termine whether the length of the shortest Hamilton path in the graph is 
k or less. 


3. TAUT (Tautologies): given a propositional boolean formula, determine 
whether it is true for all possible truth assignments to its variables. 


4. QBF (Quantified Boolean Formulas): given a boolean formula with quan- 
tifiers for-all and there-exists, such that there are no free variables, deter- 
mine whether the formula is true. 


In the diagram [shown on-line by the Gradiance system, and illustrating the 
classes P, NP, co-NP, PS, NPS, and recursive] are seven regions, P and 
A through F. Place each of the four problems in its correct region, on the 
assumption that NP is equal to neither P nor co-NP nor PS. 


11.8 References for Chapter 11 


Paper [3] initiated the study of classes of languages defined by bounds on the 
amount of space used by a Turing machine. The first PS-complete prob- 
lems were given by Karp [5] in his paper that explored the importance of 
NP-completeness. The PS-completeness of the problem of Exercise 11.3.2 — 
whether a regular expression is equivalent to X* — is from there. 

PS-completeness of quantified boolean formulas is unpublished work of L. J. 
Stockmeyer. PS-completeness of the Shannon switching game (Exercise 11.3.3) 
is from [2]. 

The fact that the primes are in NP is by Pratt [10]. The presence of the 
composite numbers in RP was first shown by Rabin [11]. Interestingly, there 
was published at about the same time a proof that the primes are actually in 
P, provided that an unproved, but generally believed, assumption called the 
extended Riemann hypothesis is true [7]. A generation later, a fully polynomial 
algorithm [1] for primality testing was discovered. 

Several books are available to extend your knowledge of the topics intro- 
duced in this chapter. [8] covers randomized algorithms, including the complete 
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